EHRAgent: Code Empowers Large Language Models for Few-shot Complex Tabular Reasoning on Electronic Health Records (2401.07128v3)
Abstract: LLMs have demonstrated exceptional capabilities in planning and tool utilization as autonomous agents, but few have been developed for medical problem-solving. We propose EHRAgent, an LLM agent empowered with a code interface, to autonomously generate and execute code for multi-tabular reasoning within electronic health records (EHRs). First, we formulate an EHR question-answering task into a tool-use planning process, efficiently decomposing a complicated task into a sequence of manageable actions. By integrating interactive coding and execution feedback, EHRAgent learns from error messages and improves the originally generated code through iterations. Furthermore, we enhance the LLM agent by incorporating long-term memory, which allows EHRAgent to effectively select and build upon the most relevant successful cases from past experiences. Experiments on three real-world multi-tabular EHR datasets show that EHRAgent outperforms the strongest baseline by up to 29.6% in success rate. EHRAgent leverages the emerging few-shot learning capabilities of LLMs, enabling autonomous code generation and execution to tackle complex clinical tasks with minimal demonstrations.
- Conversational health agents: A personalized llm-powered agent framework.
- Palm 2 technical report.
- Duane Bender and Kamran Sartipi. 2013. Hl7 fhir: An agile and restful approach to healthcare information exchange. In Proceedings of the 26th IEEE international symposium on computer-based medical systems, pages 326–331. IEEE.
- Autonomous chemical research with large language models. Nature, 624(7992):570–578.
- Augmenting large language models with chemistry tools. In NeurIPS 2023 AI for Science Workshop.
- Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks. Transactions on Machine Learning Research.
- Teaching large language models to self-debug.
- Polyie: A dataset of information extraction from polymer material scientific literature. arXiv preprint arXiv:2311.07715.
- Electronic health records to facilitate clinical research. Clinical Research in Cardiology, 106:1–9.
- Pal: Program-aided language models. In International Conference on Machine Learning, pages 10764–10799. PMLR.
- Tracy D Gunter and Nicolas P Terry. 2005. The emergence of national electronic health record architectures in the united states and australia: models, costs, and questions. Journal of medical Internet research, 7(1):e383.
- Reasoning with language model is planning with world model. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 8154–8173, Singapore. Association for Computational Linguistics.
- Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In International Conference on Machine Learning, pages 9118–9147. PMLR.
- Health system-scale language models are all-purpose prediction engines. Nature, pages 1–6.
- What disease does this patient have? a large-scale open domain question answering dataset from medical exams. Applied Sciences, 11(14):6421.
- Genegpt: Augmenting large language models with domain tools for improved access to biomedical information.
- Mimic-iii, a freely accessible critical care database. Scientific data, 3(1):1–9.
- Mrkl systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning.
- EHRSQL: A practical text-to-SQL benchmark for electronic health records. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
- CAMEL: Communicative agents for ”mind” exploration of large language model society. In Thirty-seventh Conference on Neural Information Processing Systems.
- Code as policies: Language model programs for embodied control. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 9493–9500. IEEE.
- Llm+p: Empowering large language models with optimal planning proficiency.
- Chameleon: Plug-and-play compositional reasoning with large language models. In Thirty-seventh Conference on Neural Information Processing Systems.
- Smart on fhir: a standards-based, interoperable apps platform for electronic health records. Journal of the American Medical Informatics Association, 23(5):899–908.
- Foundation models for generalist medical artificial intelligence. Nature, 616(7956):259–265.
- Webgpt: Browser-assisted question-answering with human feedback. arXiv preprint arXiv:2112.0933.
- On evaluating the integration of reasoning and action in llm agents with database question answering.
- OpenAI. 2023. Gpt-4 technical report. arXiv.
- Talm: Tool augmented language models.
- Gorilla: Large language model connected with massive apis.
- The eICU collaborative research database, a freely available multi-center database for critical care research. Scientific Data, 5(1):180178.
- Tool learning with foundation models.
- Toolllm: Facilitating large language models to master 16000+ real-world apis.
- emrkbqa: A clinical knowledge-base question answering dataset. In Proceedings of the 20th Workshop on Biomedical Language Processing, pages 64–73.
- Toolformer: Language models can teach themselves to use tools. In Thirty-seventh Conference on Neural Information Processing Systems.
- HuggingGPT: Solving AI tasks with chatGPT and its friends in hugging face. In Thirty-seventh Conference on Neural Information Processing Systems.
- Retrieval-augmented large language models for adolescent idiopathic scoliosis patients in shared decision-making. In Proceedings of the 14th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, pages 1–10.
- Reflexion: language agents with verbal reinforcement learning. In Thirty-seventh Conference on Neural Information Processing Systems.
- Adaplanner: Adaptive planning from feedback with language models. In Thirty-seventh Conference on Neural Information Processing Systems.
- Medagents: Large language models as collaborators for zero-shot medical reasoning.
- Surendrabikram Thapa and Surabhi Adhikari. 2023. Chatgpt, bard, and large language models for biomedical research: opportunities and pitfalls. Annals of Biomedical Engineering, 51(12):2647–2651.
- A survey on large language model based autonomous agents.
- Text-to-sql generation for question answering on electronic medical records. In Proceedings of The Web Conference 2020, pages 350–361.
- Augmenting language models with long-term memory. In Thirty-seventh Conference on Neural Information Processing Systems.
- Scibench: Evaluating college-level scientific problem-solving abilities of large language models.
- Self-consistency improves chain of thought reasoning in language models. In The Eleventh International Conference on Learning Representations.
- Chain-of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems, volume 35, pages 24824–24837. Curran Associates, Inc.
- Visual chatgpt: Talking, drawing and editing with visual foundation models.
- Autogen: Enabling next-gen llm applications via multi-agent conversation.
- The rise and potential of large language model based agents: A survey.
- Knowledge-infused prompting: Assessing and advancing clinical text data generation with large language models.
- Counterfactual and factual reasoning over hypergraphs for interpretable clinical predictions on ehr. In Proceedings of the 2nd Machine Learning for Health symposium, volume 193 of Proceedings of Machine Learning Research, pages 259–278. PMLR.
- Intercode: Standardizing and benchmarking interactive coding with execution feedback.
- A large language model for electronic health records. NPJ Digital Medicine, 5(1):194.
- Tree of thoughts: Deliberate problem solving with large language models. In Thirty-seventh Conference on Neural Information Processing Systems.
- React: Synergizing reasoning and acting in language models. In The Eleventh International Conference on Learning Representations.
- Ecoassistant: Using llm assistant more affordably and accurately.
- Expel: Llm agents are experiential learners.
- Least-to-most prompting enables complex reasoning in large language models. In The Eleventh International Conference on Learning Representations.
- Toolchain*: Efficient action space navigation in large language models with a* search. arXiv preprint arXiv:2310.13227.
- ToolQA: A dataset for LLM question answering with external tools. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.