Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Inquire, Interact, and Integrate: A Proactive Agent Collaborative Framework for Zero-Shot Multimodal Medical Reasoning (2405.11640v1)

Published 19 May 2024 in cs.AI, cs.CL, and cs.CV

Abstract: The adoption of LLMs in healthcare has attracted significant research interest. However, their performance in healthcare remains under-investigated and potentially limited, due to i) they lack rich domain-specific knowledge and medical reasoning skills; and ii) most state-of-the-art LLMs are unimodal, text-only models that cannot directly process multimodal inputs. To this end, we propose a multimodal medical collaborative reasoning framework \textbf{MultiMedRes}, which incorporates a learner agent to proactively gain essential information from domain-specific expert models, to solve medical multimodal reasoning problems. Our method includes three steps: i) \textbf{Inquire}: The learner agent first decomposes given complex medical reasoning problems into multiple domain-specific sub-problems; ii) \textbf{Interact}: The agent then interacts with domain-specific expert models by repeating the ``ask-answer'' process to progressively obtain different domain-specific knowledge; iii) \textbf{Integrate}: The agent finally integrates all the acquired domain-specific knowledge to accurately address the medical reasoning problem. We validate the effectiveness of our method on the task of difference visual question answering for X-ray images. The experiments demonstrate that our zero-shot prediction achieves state-of-the-art performance, and even outperforms the fully supervised methods. Besides, our approach can be incorporated into various LLMs and multimodal LLMs to significantly boost their performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. A multi-agent deep reinforcement learning approach for enhancement of covid-19 ct image segmentation. Journal of Personalized Medicine.
  2. Optimized control for medical image segmentation: improved multi-agent systems agreements using particle swarm optimization. Journal of Ambient Intelligence and Humanized Computing.
  3. A stochastic multi-agent approach for medical-image segmentation: Application to tumor segmentation in brain mr images. Artificial Intelligence in Medicine.
  4. Clara: Clinical report auto-completion. Proceedings of The Web Conference 2020.
  5. Language models are few-shot learners. In NIPS.
  6. Minigpt-v2: large language model as a unified interface for vision-language multi-task learning. arXiv preprint arXiv: 2310.09478.
  7. Multi-modal masked autoencoders for medical vision-and-language pre-training. In International Conference on Medical Image Computing and Computer-Assisted Intervention.
  8. Chexagent: Towards a foundation model for chest x-ray interpretation. arXiv preprint arXiv: 2401.12208.
  9. Difficulties in the interpretation of chest radiography. Comparative interpretation of CT and standard radiography of the chest.
  10. Multiple meta-model quantifying for medical visual question answering. In MICCAI.
  11. Neural naturalist: Generating fine-grained image comparisons. In EMNLP.
  12. KAT: A knowledge augmented transformer for vision-and-language. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
  13. Expert knowledge-aware image difference graph representation learning for difference-aware medical visual question answering. KDD.
  14. Densely connected convolutional networks. CVPR.
  15. Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison. AAAI.
  16. Harsh Jhamtani and Taylor Berg-Kirkpatrick. 2018. Learning to describe differences between pairs of similar images. In EMNLP.
  17. Mimic-cxr, a de-identified publicly available database of chest radiographs with free-text reports. Scientific Data.
  18. Alon Lavie and Abhaya Agarwal. 2007. Meteor: an automatic metric for mt evaluation with high levels of correlation with human judgments. In Proceedings of the Second Workshop on Statistical Machine Translation, StatMT.
  19. Llm-cxr: Instruction-finetuned llm for cxr image understanding and generation. arXiv preprint arXiv: 2305.11490.
  20. Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out.
  21. Exploring and distilling posterior and prior knowledge for radiology report generation. In CVPR.
  22. Auto-encoding knowledge graph for unsupervised medical report generation. In NeurIPS.
  23. Visual instruction tuning. In NeurIPS.
  24. Unified-io: A unified model for vision, language, and multi-modal tasks. arXiv preprint arXiv:2206.08916.
  25. Med-flamingo: a multimodal medical few-shot learner.
  26. Bleu: a method for automatic evaluation of machine translation. In ACL.
  27. Generative agents: Interactive simulacra of human behavior. arXiv preprint arXiv: 2304.03442.
  28. Xraygpt: Chest radiographs summarization using medical vision-language models. arXiv preprint arXiv: 2306.07971.
  29. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv: 2307.09288.
  30. Cider: Consensus-based image description evaluation. In CVPR.
  31. Adapting llm agents through communication.
  32. Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In CVPR.
  33. Elixr: Towards a general purpose x-ray artificial intelligence system through alignment of large language models and radiology vision encoders. arXiv preprint arXiv: 2308.01317.
  34. An empirical study of GPT-3 for few-shot knowledge-based VQA. AAAI.
  35. Automatic generation of medical imaging diagnostic report with hierarchical recurrent neural network. ICDM.
  36. Pmc-vqa: Visual instruction tuning for medical visual question answering. arXiv preprint arXiv:2305.10415.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Zishan Gu (3 papers)
  2. Fenglin Liu (54 papers)
  3. Changchang Yin (22 papers)
  4. Ping Zhang (436 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets