Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Surgical-LVLM: Learning to Adapt Large Vision-Language Model for Grounded Visual Question Answering in Robotic Surgery (2405.10948v1)

Published 22 Mar 2024 in cs.CV, cs.AI, cs.RO, and eess.IV

Abstract: Recent advancements in Surgical Visual Question Answering (Surgical-VQA) and related region grounding have shown great promise for robotic and medical applications, addressing the critical need for automated methods in personalized surgical mentorship. However, existing models primarily provide simple structured answers and struggle with complex scenarios due to their limited capability in recognizing long-range dependencies and aligning multimodal information. In this paper, we introduce Surgical-LVLM, a novel personalized large vision-LLM tailored for complex surgical scenarios. Leveraging the pre-trained large vision-LLM and specialized Visual Perception LoRA (VP-LoRA) blocks, our model excels in understanding complex visual-language tasks within surgical contexts. In addressing the visual grounding task, we propose the Token-Interaction (TIT) module, which strengthens the interaction between the grounding module and the language responses of the Large Visual LLM (LVLM) after projecting them into the latent space. We demonstrate the effectiveness of Surgical-LVLM on several benchmarks, including EndoVis-17-VQLA, EndoVis-18-VQLA, and a newly introduced EndoVis Conversations dataset, which sets new performance standards. Our work contributes to advancing the field of automated surgical mentorship by providing a context-aware solution.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Guankun Wang (20 papers)
  2. Long Bai (87 papers)
  3. Wan Jun Nah (3 papers)
  4. Jie Wang (480 papers)
  5. Zhaoxi Zhang (19 papers)
  6. Zhen Chen (151 papers)
  7. Jinlin Wu (37 papers)
  8. Mobarakol Islam (65 papers)
  9. Hongbin Liu (80 papers)
  10. Hongliang Ren (98 papers)
Citations (8)

Summary

We haven't generated a summary for this paper yet.