Agent Hospital: Autonomous Healthcare Simulation

Updated 17 March 2026

Agent Hospital is a simulation system where patients, clinicians, administrators, and resources are modeled as autonomous agents to realistically emulate hospital workflows.
It employs multi-agent interactions, digital twins, and iterative learning loops to refine diagnostic, administrative, and operational performance.
These systems offer practical applications in scheduling, epidemic control, and EHR reasoning, providing high-fidelity benchmarks for both research and practice.

Agent Hospital refers to a class of hospital simulation and automation systems in which all participants—patients, clinicians, administrators, and supporting resources—are modeled as autonomous agents, often powered by LLMs or other AI frameworks. Agent Hospital platforms implement end-to-end clinical, administrative, and operational workflows in simulated or real environments, facilitating research in medical AI, healthcare operations, and multi-agent collaboration. Modern Agent Hospital systems range from high-fidelity digital twins and controllable evaluation environments, to fully executable simulacra in which agents learn, reason, and evolve by interacting with large-scale synthetic or procedural hospital environments.

1. Conceptual Foundations and Motivations

Agent Hospital systems arise from the convergence of agent-based modeling, advances in autonomous LLM-based agents, and the need for closed-loop evaluation and evolution of medical AI. Key motivations include:

Lack of robust, scalable environments for lifelong experiential learning by clinical AI agents.
Desire to replace or augment predefined rule-based simulation with adaptive, realistic agent behaviors spanning triage, diagnosis, treatment, and administration.
Need for controllable benchmarks and digital twins to rigorously evaluate clinical pathway performance, workflow bottlenecks, and operational interventions.
Elimination of the dependency on large manually annotated datasets through self-supervised “learning on the job” via simulated patient–provider interactions.

Agent Hospitals typically integrate multiple agent roles (doctor, patient, nurse, administrator), tool or skill invocation engines, stateful memory mechanisms, and full-stack procedural environments or sandboxes (Li et al., 2024, Zhu et al., 11 Dec 2025, Yang et al., 12 Mar 2026, Fan et al., 2024).

2. Architectures and Agent Design Patterns

Agent Hospital architectures are characterized by multi-agent interaction protocols embedded in a simulated or embodied hospital environment. Central design patterns include:

Full Hospital Simulacra: As in "Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents", all patients, nurses, and doctors are realized as autonomous LLM-powered agents. The system is instantiated as a discrete-event, spatially explicit environment (e.g., a 2D tiled sandbox with 16 functional medical areas) (Li et al., 2024).
Administrative Multi-Agent Systems: Frameworks such as H-AdminSim focus on simulating complex real-world administrative workflows, including intake, scheduling, rescheduling, and cancellations, adhering to FHIR interoperability standards (Lee et al., 5 Feb 2026).
Agentic Operating Systems: The "Agentic Operating System for Hospital" (built on OpenClaw) introduces Linux user-namespace isolation, document-centric message passing, manifest-guided memory, and curated medical skills, enabling secure and auditable multi-agent workflow coordination (Yang et al., 12 Mar 2026).
Evaluation Environments: CP-Env and AI Hospital provide procedural, branchable agentic environments for systematic evaluation of medical agents on multi-step clinical pathways and doctor–patient dialogues, with ground-truth tracking and performance rubrics (Zhu et al., 11 Dec 2025, Fan et al., 2024).

Agent designs encompass a wide spectrum, from LLM-based dialogue and reasoning agents, to scheduling and resource-matching agents, to tool-using planner–caller paradigms for robust workflow execution (e.g., the Qwen 2.5 7B multi-agent medical assistant) (Gawade et al., 7 Mar 2025).

3. Learning and Evolutionary Methodologies

Several Agent Hospital systems leverage on-line or experiential learning, where agent policies are continuously refined through interaction:

Self-Evolution via MedAgent-Zero: Agent Hospital supports a parameter-free, iterative learning loop where doctor agents consult with patient agents, receive rewards for examination/diagnosis/treatment correctness, and automatically distill error-driven clinical principles for future retrieval. Correct examples and distilled principles are stored in memory libraries and used as guiding context in downstream consultations (Li et al., 2024).
Behavioral Adaptation by Experience Replay: In scenarios such as dual-objective epidemic management (H2-MARL), agent policies are updated using experience replay buffers mixed with heuristically generated "expert" trajectories (Luo et al., 13 Mar 2025).
Retrieval-Augmented Reasoning: EHRAgent generalizes few-shot code generation for EHR QA by retrieving top-K similar solved cases from long-term memory, incrementally debugging erroneous code plans via error feedback and dedicated LLM debugging modules (Shi et al., 2024).
Dynamic Multi-Agent Triage: Multi-agent systems orchestrate structured inquiry, recipient transformation, and department guidance with per-round updates to candidate diagnostic sets, pattern matching, and entropy-minimizing information gain (Cheng et al., 30 Jul 2025).

Table: Evolutionary Mechanisms in Agent Hospital Systems

System	Memory/Replay	Learning Loop	Empirical Gain
Agent Hospital	Record + principle lib	MedAgent-Zero self-evolution	93.06% MedQA acc.
EHRAgent	Case-based memory ℳ	Debug–execute–refine loop	+29.6 SR (TREQS)
H2-MARL	Agent + expert replay	Multi-agent RL gradient	Pareto optimality

Agent Hospital platforms thus support rapid skill acquisition and performance surpassing static baseline models, often achieving state-of-the-art on external benchmarks (Li et al., 2024, Shi et al., 2024, Luo et al., 13 Mar 2025).

4. Controllability, Adaptation, and Evaluation

Modern Agent Hospital systems incorporate design features for fine-grained control, procedural variability, and systematic evaluation:

Environment Control Knobs: Simulation frameworks allow adjustment of patient case parameters (age, severity, comorbidities), pathway branching, tool availability, and emergent events (e.g., adverse reactions, data dropout) (Zhu et al., 11 Dec 2025).
Departmental and Organizational Adaptability: Agentic triage and scheduling frameworks employ institution-specific department hierarchies and rule libraries, achieving high accuracy and adaptability without costly re-tuning (e.g., pattern-matching engines for diverse hospital topologies) (Cheng et al., 30 Jul 2025, Lee et al., 5 Feb 2026).
Rubric-based Evaluation: Multi-tier rubrics assess agent performance across clinical, process, and ethical dimensions (e.g., Work Completion, Diagnosis Recall@k, Inquiry Sufficiency, Privacy Safeguard, Treatment Individualization) (Zhu et al., 11 Dec 2025).
Collaborative and Debate Mechanisms: Multi-doctor collaboration and centralized dispute-resolution protocols partially bridge the gap between interactive and one-shot diagnostic performance, enhancing robustness and final consensus accuracy (Fan et al., 2024).

These features support robust benchmarking, adversarial testing, and transferability across real and synthetic hospital settings.

5. Practical Applications and Real-World Implications

Agent Hospital research has produced advances in both simulation environments and directly actionable automation recipes:

Few-Shot EHR Reasoning: EHRAgent demonstrates code-based planning and autonomous tool use to translate natural clinical questions into executable EHR queries, achieving deterministic high performance on three multi-table hospital datasets (e.g., +29.6% SR on TREQS vs. best baseline), with portability guaranteed by schema- and tool-stub substitution and four in-context demos (Shi et al., 2024).
Distributed Scheduling and Administration: DOPSG and H-AdminSim formalize distributed, partially observable agent-based management for patient scheduling, triage, and appointment systems, which lower mean tardiness by up to 36.8% and support FHIR R5 interoperability (Mageshwari et al., 2012, Lee et al., 5 Feb 2026).
Resource and Epidemic Control: H2-MARL achieves Pareto-optimal mobility restrictions across city-wide hospital networks during outbreaks, via township-level multi-agent RL, outperforming expert and single-agent policies on real billion-scale OD datasets (Luo et al., 13 Mar 2025).
Controllable Clinical Benchmarks: CP-Env and AI Hospital enable end-to-end evaluation of LLMs and medical AI agents under dynamic, multistage clinical pathways, surfacing critical failure modes (hallucination, information loss) and supporting multi-agent collaboration and deliberation (Zhu et al., 11 Dec 2025, Fan et al., 2024).

Collectively, these systems facilitate both high-fidelity research and the translation of agent-based designs into deployable, interoperable clinical and administrative solutions.

6. Challenges, Limitations, and Future Directions

Notable open questions and constraints in current Agent Hospital systems include:

Simulation–Reality Gap: Many frameworks rely on synthetic data or LLM-generated ground truth, potentially propagating biases or omitting real-world complexity (e.g., limited comorbidity modeling, expert verification bottlenecks) (Li et al., 2024).
Scalability and Real-Time Robustness: High-fidelity agents may be bottlenecked by API latency or insufficient resource abstraction; partial-information approaches can be suboptimal for global objectives (Mageshwari et al., 2012, Li et al., 2024).
Security, Privacy, and Auditability: Agentic OS approaches address safety via OS-level containment, append-only audit logs, and least-privilege skill invocation. Nonetheless, practical deployment remains contingent on regulatory compliance and policy enforcement (Yang et al., 12 Mar 2026).
Generalization and Extension: There is active interest in expanding agent evolution recipes, agent memory architectures, and interoperability features to new clinical specialties, administrative tasks, and medical modalities (e.g., imaging, laboratory integration) (Li et al., 2024, Lee et al., 5 Feb 2026, Gawade et al., 7 Mar 2025).

Future work includes integrating multimodal inputs, fine-grained medical ontologies, advanced scheduling and assignment algorithms, reinforcement learning reward redesign, and automated, cost-effective evaluation pipelines.

7. Representative Systems and Benchmarks

The following table summarizes selected Agent Hospital systems and their domains:

System	Focus	Key Features	Reference
Agent Hospital	Full hospital simulacrum, doctor evol.	MedAgent-Zero, record/principle memory	(Li et al., 2024)
EHRAgent	Few-shot EHR QA, code planning	Interactive debug cycle, case-based memory	(Shi et al., 2024)
H-AdminSim	Administrative workflow simulation	FHIR R5 integration, multi-agent roles	(Lee et al., 5 Feb 2026)
CP-Env	Clinical pathway evaluation	Branching agentic env, 3-tier rubrics	(Zhu et al., 11 Dec 2025)
AI Hospital	Diagnostic multi-agent dialogue	MVME benchmark, consensus protocol	(Fan et al., 2024)
H2-MARL	Epidemic resource control	Multi-agent RL, Pareto dual-objective	(Luo et al., 13 Mar 2025)
DOPSG	Distributed scheduling	Partial info, minimal comms, grouping	(Mageshwari et al., 2012)