MMedAgent: Learning to Use Medical Tools with Multi-modal Agent (2407.02483v2)

Published 2 Jul 2024 in cs.CL and cs.AI

Abstract: Multi-Modal LLMs (MLLMs), despite being successful, exhibit limited generality and often fall short when compared to specialized models. Recently, LLM-based agents have been developed to address these challenges by selecting appropriate specialized models as tools based on user inputs. However, such advancements have not been extensively explored within the medical domain. To bridge this gap, this paper introduces the first agent explicitly designed for the medical field, named \textbf{M}ulti-modal \textbf{Med}ical \textbf{Agent} (MMedAgent). We curate an instruction-tuning dataset comprising six medical tools solving seven tasks across five modalities, enabling the agent to choose the most suitable tools for a given task. Comprehensive experiments demonstrate that MMedAgent achieves superior performance across a variety of medical tasks compared to state-of-the-art open-source methods and even the closed-source model, GPT-4o. Furthermore, MMedAgent exhibits efficiency in updating and integrating new medical tools. Codes and models are all available.

PDF Abstract

Summarize Bookmark Chat (Pro)

References (52)

Authors (11)

Binxu Li (4 papers)
Tiankai Yan (1 paper)
Yuanting Pan (2 papers)
Zhe Xu (199 papers)
Jie Luo (100 papers)
Ruiyang Ji (1 paper)
Shilong Liu (60 papers)
Haoyu Dong (55 papers)
Zihao Lin (22 papers)
Yixin Wang (103 papers)
Jiayuan Ding (14 papers)

Citations (12)

View on Semantic Scholar

Tweets

https://twitter.com/JiayuanDing/status/1845477499058426335

https://twitter.com/YixinxinWang/status/1846642288526798886

https://twitter.com/xuzhe180/status/1848420684525146507

MMedAgent: Learning to Use Medical Tools with Multi-modal Agent (2407.02483v2)

Related Papers

Tweets