Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Large Language Models as Agents in the Clinic (2309.10895v1)

Published 19 Sep 2023 in cs.HC and cs.MA

Abstract: Recent developments in LLMs have unlocked new opportunities for healthcare, from information synthesis to clinical decision support. These new LLMs are not just capable of modeling language, but can also act as intelligent "agents" that interact with stakeholders in open-ended conversations and even influence clinical decision-making. Rather than relying on benchmarks that measure a model's ability to process clinical data or answer standardized test questions, LLM agents should be assessed for their performance on real-world clinical tasks. These new evaluation frameworks, which we call "Artificial-intelligence Structured Clinical Examinations" ("AI-SCI"), can draw from comparable technologies where machines operate with varying degrees of self-governance, such as self-driving cars. High-fidelity simulations may also be used to evaluate interactions between users and LLMs within a clinical workflow, or to model the dynamic interactions of multiple LLMs. Developing these robust, real-world clinical evaluations will be crucial towards deploying LLM agents into healthcare.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. Large language models are few-shot clinical information extractors. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 1998–2022, 2022.
  2. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  3. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712, 2023.
  4. Benefits, limits, and risks of gpt-4 as an ai chatbot for medicine. New England Journal of Medicine, 388(13):1233–1239, 2023.
  5. Assessing the potential of usmle-like exam questions generated by gpt-4. medRxiv, pages 2023–04, 2023.
  6. Capabilities of gpt-4 on medical challenge problems. arXiv preprint arXiv:2303.13375, 2023.
  7. Evaluation of gpt-3.5 and gpt-4 for supporting real-world information needs in healthcare delivery. arXiv preprint arXiv:2304.13714, 2023.
  8. The shaky foundations of large language models and foundation models for electronic health records. npj Digital Medicine, 6(1):135, 2023.
  9. Introducing dr. chatbot. https://today.ucsd.edu/story/introducing-dr-chatbot.
  10. The diagnostic and triage accuracy of the gpt-3 artificial intelligence model. medRxiv, pages 2023–01, 2023.
  11. Considering the possibilities and pitfalls of generative pre-trained transformer 3 (gpt-3) in healthcare delivery. NPJ Digital Medicine, 4(1):93, 2021.
  12. Steven C Bankes. Agent-based modeling: A revolution? Proceedings of the National Academy of Sciences, 99(suppl_3):7199–7200, 2002.
  13. Nemo-guardrails. https://github.com/NVIDIA/NeMo-Guardrails, 2023.
  14. The travel and environmental implications of shared autonomous vehicles, using agent-based model scenarios. Transportation Research Part C: Emerging Technologies, 40:1–13, 2014.
  15. Alexis C Madrigal. Inside waymo’s secret world for training self-driving cars. The Atlantic, 2018.
  16. Andrew C Hawkins. Welcome to simulation city, the virtual world where waymo tests its autonomous vehicles. The Verge, 2021.
  17. Agent-based modeling in public health: current applications and future directions. Annual review of public health, 39:77–94, 2018.
  18. Eric Bonabeau. Agent-based modeling: Methods and techniques for simulating human systems. Proceedings of the national academy of sciences, 99(suppl_3):7280–7287, 2002.
  19. Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8086–8098, 2022.
  20. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171, 2022.
  21. Clinical-t5: Large language models built using mimic clinical text, 2023.
  22. Generative agents: Interactive simulacra of human behavior. arXiv preprint arXiv:2304.03442, 2023.
  23. Auto-gpt for online decision making: Benchmarks and additional opinions. arXiv preprint arXiv:2306.02224, 2023.
  24. Marliyya Zayyan. Objective structured clinical examination: the assessment of choice. Oman medical journal, 26(4):219, 2011.
  25. Large language models encode clinical knowledge. arXiv preprint arXiv:2212.13138, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Nikita Mehandru (7 papers)
  2. Brenda Y. Miao (4 papers)
  3. Eduardo Rodriguez Almaraz (1 paper)
  4. Madhumita Sushil (15 papers)
  5. Atul J. Butte (13 papers)
  6. Ahmed Alaa (23 papers)
Citations (1)