Toward the Autonomous AI Doctor: Quantitative Benchmarking of an Autonomous Agentic AI Versus Board-Certified Clinicians in a Real World Setting (2507.22902v1)

Published 27 Jun 2025 in cs.HC, cs.AI, cs.CL, and cs.MA

Abstract: Background: Globally we face a projected shortage of 11 million healthcare practitioners by 2030, and administrative burden consumes 50% of clinical time. AI has the potential to help alleviate these problems. However, no end-to-end autonomous LLM-based AI system has been rigorously evaluated in real-world clinical practice. In this study, we evaluated whether a multi-agent LLM-based AI framework can function autonomously as an AI doctor in a virtual urgent care setting. Methods: We retrospectively compared the performance of the multi-agent AI system Doctronic and board-certified clinicians across 500 consecutive urgent-care telehealth encounters. The primary end points: diagnostic concordance, treatment plan consistency, and safety metrics, were assessed by blinded LLM-based adjudication and expert human review. Results: The top diagnosis of Doctronic and clinician matched in 81% of cases, and the treatment plan aligned in 99.2% of cases. No clinical hallucinations occurred (e.g., diagnosis or treatment not supported by clinical findings). In an expert review of discordant cases, AI performance was superior in 36.1%, and human performance was superior in 9.3%; the diagnoses were equivalent in the remaining cases. Conclusions: In this first large-scale validation of an autonomous AI doctor, we demonstrated strong diagnostic and treatment plan concordance with human clinicians, with AI performance matching and in some cases exceeding that of practicing clinicians. These findings indicate that multi-agent AI systems achieve comparable clinical decision-making to human providers and offer a potential solution to healthcare workforce shortages.

Collections

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Authors (8)

Tweets

https://twitter.com/iAmGiliyar/status/1955111534902030435