Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MedRG: Medical Report Grounding with Multi-modal Large Language Model (2404.06798v1)

Published 10 Apr 2024 in cs.CV

Abstract: Medical Report Grounding is pivotal in identifying the most relevant regions in medical images based on a given phrase query, a critical aspect in medical image analysis and radiological diagnosis. However, prevailing visual grounding approaches necessitate the manual extraction of key phrases from medical reports, imposing substantial burdens on both system efficiency and physicians. In this paper, we introduce a novel framework, Medical Report Grounding (MedRG), an end-to-end solution for utilizing a multi-modal LLM to predict key phrase by incorporating a unique token, BOX, into the vocabulary to serve as an embedding for unlocking detection capabilities. Subsequently, the vision encoder-decoder jointly decodes the hidden embedding and the input medical image, generating the corresponding grounding box. The experimental results validate the effectiveness of MedRG, surpassing the performance of the existing state-of-the-art medical phrase grounding methods. This study represents a pioneering exploration of the medical report grounding task, marking the first-ever endeavor in this domain.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Ke Zou (35 papers)
  2. Yang Bai (205 papers)
  3. Zhihao Chen (66 papers)
  4. Yang Zhou (311 papers)
  5. Yidi Chen (7 papers)
  6. Kai Ren (18 papers)
  7. Meng Wang (1063 papers)
  8. Xuedong Yuan (10 papers)
  9. Xiaojing Shen (31 papers)
  10. Huazhu Fu (185 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com