LLM-Assisted Multi-Teacher Continual Learning for Visual Question Answering in Robotic Surgery (2402.16664v3)
Abstract: Visual question answering (VQA) is crucial for promoting surgical education. In practice, the needs of trainees are constantly evolving, such as learning more surgical types, adapting to different robots, and learning new surgical instruments and techniques for various surgeries. However, patient data privacy often restricts the availability of old data when updating the model, necessitating an exemplar-free continual learning (CL) setup. Prior CL studies overlooked two vital problems in the surgical domain: 1) large domain shifts from diverse surgical operations collected from multiple sources, and 2) severe data imbalance arising from the uneven presence of surgical instruments or activities. This paper proposes addressing these problems with a multimodal LLM and an adaptive weight assignment methodology. We first develop a new multi-teacher CL framework that leverages a multimodal LLM as the additional teacher. The strong generalization ability of the LLM can bridge the knowledge gap when domain shifts and data imbalances occur. We then put forth a novel data processing method that transforms complex LLM embeddings into logits compatible with our CL framework. We further design an adaptive weight assignment approach that balances the generalization ability of the LLM and the domain expertise of the old CL model. Finally, to comprehensively test the effectiveness of our proposed method, we have also constructed two new surgical VQA datasets that are largely different from existing ones and could be valuable resources for future research. Extensive experimental results on the tested datasets demonstrate the superiority of our method to other advanced CL schemes.
- H. Wang, Y. Jin, and L. Zhu, “Dynamic interactive relation capturing via scene graph learning for robotic surgical report generation,” in 2023 IEEE ICRA, 2023, pp. 2702–2709.
- Z. Zhao, Y. Jin, and P. Heng, “TraSeTR: Track-to-segment transformer with contrastive query for instance-level instrument segmentation in robotic surgery,” in 2022 IEEE ICRA, 2022, pp. 11 186–11 193.
- Z. Zhao, Y. Jin, B. Lu, C.-F. Ng, Q. Dou, Y.-H. Liu, and P.-A. Heng, “One to many: Adaptive instrument segmentation via meta learning and dynamic online adaptation in robotic surgical video,” in 2021 IEEE ICRA, 2021, pp. 13 553–13 559.
- Y. Jin, K. Cheng, Q. Dou, and P.-A. Heng, “Incorporating temporal prior from motion flow for instrument segmentation in minimally invasive surgery video,” in Medical Image Computing and Computer Assisted Intervention (MICCAI) 2019. Springer, 2019, pp. 440–448.
- L. Seenivasan, M. Islam, A. K. Krishna, and H. Ren, “Surgical-VQA: Visual question answering in surgical scenes using transformer,” in 2022 MICCAI. Springer, 2022, pp. 33–43.
- L. Bai, M. Islam, and H. Ren, “Co-attention gated vision-language embedding for visual question localized-answering in robotic surgery,” arXiv preprint arXiv:2307.05182, 2023.
- L. Seenivasan, M. Islam, G. Kannan, and H. Ren, “SurgicalGPT: End-to-end language-vision GPT for visual question answering in surgery,” arXiv preprint arXiv:2304.09974, 2023.
- B. D. Nguyen, T.-T. Do, B. X. Nguyen, T. Do, E. Tjiputra, and Q. D. Tran, “Overcoming data limitation in medical visual question answering,” in 2019 MICCAI. Springer, 2019, pp. 522–530.
- Q. Wu, P. Wang, X. Wang, X. He, and W. Zhu, “Medical VQA,” in Visual Question Answering: From Theory to Application. Springer, 2022, pp. 165–176.
- Y. Khare, V. Bagal, M. Mathew, A. Devi, U. D. Priyakumar, and C. Jawahar, “Mmbert: Multimodal Bert pretraining for improved medical vqa,” in 2021 ISBI. IEEE, 2021, pp. 1033–1036.
- R. M. French, “Catastrophic forgetting in connectionist networks,” Trends in cognitive sciences, vol. 3, no. 4, pp. 128–135, 1999.
- K. Shu, H. Li, J. Cheng, Q. Guo, L. Leng, J. Liao, Y. Hu, and J. Liu, “Replay-oriented gradient projection memory for continual learning in medical scenarios,” in 2022 IEEE BIBM, 2022, pp. 1724–1729.
- M. M. Derakhshani, I. Najdenkoska, T. van Sonsbeek, X. Zhen, D. Mahapatra, M. Worring, and C. G. Snoek, “Lifelonger: A benchmark for continual disease classification,” in 2022 MICCAI. Springer, 2022, pp. 314–324.
- M. Lenga, H. Schulz, and A. Saalbach, “Continual learning for domain adaptation in chest x-ray classification,” in Medical Imaging with Deep Learning. PMLR, 2020, pp. 413–423.
- Z. Li and D. Hoiem, “Learning without forgetting,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 12, pp. 2935–2947, 2017.
- J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska et al., “Overcoming catastrophic forgetting in neural networks,” Proc. Nat. Acad. Sci., vol. 114, no. 13, pp. 3521–3526, 2017.
- L. Bai, M. Islam, and H. Ren, “Revisiting distillation for continual learning on visual question localized-answering in robotic surgery,” arXiv preprint arXiv:2307.12045, 2023.
- M. Phuong and C. Lampert, “Towards understanding knowledge distillation,” in 2019 ICML. PMLR, 2019, pp. 5142–5151.
- C. Simon, M. Faraki, Y.-H. Tsai, X. Yu, S. Schulter, Y. Suh, M. Harandi, and M. Chandraker, “On generalizing beyond domains in cross-domain continual learning,” in 2022 CVPR, 2022, pp. 9265–9274.
- T. Diethe, T. Borchert, E. Thereska, B. Balle, and N. Lawrence, “Continual learning in practice,” arXiv preprint arXiv:1903.05202, 2019.
- C. D. Kim, J. Jeong, and G. Kim, “Imbalanced continual learning with partitioning reservoir sampling,” in 2020 ECCV. Springer, 2020, pp. 411–428.
- A. Chrysakis and M.-F. Moens, “Online continual learning from imbalanced data,” in 2020 ICML. PMLR, 2020, pp. 1952–1961.
- A. Belyaeva, J. Cosentino, F. Hormozdiari, C. Y. McLean, and N. A. Furlotte, “Multimodal LLMs for health grounded in individual-specific data,” arXiv preprint arXiv:2307.09018, 2023.
- Y. Du, S. C. Liew, K. Chen, and Y. Shao, “The power of large language models for wireless communication system development: A case study on FPGA platforms,” arXiv preprint arXiv:2307.07319, 2023.
- Z. Guo, R. Zhang, X. Zhu, Y. Tang, X. Ma, J. Han, K. Chen, P. Gao, X. Li, H. Li et al., “Point-bind & point-LLM: Aligning point cloud with multi-modality for 3D understanding, generation, and instruction following,” arXiv preprint arXiv:2309.00615, 2023.
- H. Cui, Y. Du, Q. Yang, Y. Shao, and S. C. Liew, “LLMind: Orchestrating AI and IoT with LLMs for complex task execution,” arXiv preprint arXiv:2312.09007, 2023.
- A. J. Thirunavukarasu, D. S. J. Ting, K. Elangovan, L. Gutierrez, T. F. Tan, and D. S. W. Ting, “Large language models in medicine,” Nature medicine, pp. 1–11, 2023.
- H. Nori, N. King, S. M. McKinney, D. Carignan, and E. Horvitz, “Capabilities of GPT-4 on medical challenge problems,” arXiv preprint arXiv:2303.13375, 2023.
- S. Min, X. Lyu, A. Holtzman, M. Artetxe, M. Lewis, H. Hajishirzi, and L. Zettlemoyer, “Rethinking the role of demonstrations: What makes in-context learning work?” arXiv preprint arXiv:2202.12837, 2022.
- S. Min, M. Lewis, L. Zettlemoyer, and H. Hajishirzi, “Metaicl: Learning to learn in context,” arXiv preprint arXiv:2110.15943, 2021.
- K. Chen, J. Li, K. Wang, Y. Du, J. Yu, J. Lu, G. Chen, L. Li, J. Qiu, Q. Fang et al., “Towards an automatic ai agent for reaction condition recommendation in chemical synthesis,” arXiv preprint arXiv:2311.10776, 2023.
- K. Chen, H. Cao, J. Li, Y. Du, M. Guo, X. Zeng, L. Li, J. Qiu, P. A. Heng, and G. Chen, “An autonomous large language model agent for chemical literature data mining,” arXiv preprint arXiv:2402.12993, 2024.
- O. Rubin, J. Herzig, and J. Berant, “Learning to retrieve prompts for in-context learning,” arXiv preprint arXiv:2112.08633, 2021.
- J. Gou, B. Yu, S. J. Maybank, and D. Tao, “Knowledge distillation: A survey,” International Journal of Computer Vision, vol. 129, pp. 1789–1819, 2021.
- X. Dai, Z. Jiang, Z. Wu, Y. Bao, Z. Wang, S. Liu, and E. Zhou, “General instance distillation for object detection,” in 2021 CVPR, 2021, pp. 7842–7851.
- D. Wenliang, L. Junnan, L. Dongxu, T. Anthony Meng Huat, Z. Junqi, W. Weisheng, L. Boyang, F. Pascale, and H. Steven, “InstructBLIP: Towards general-purpose vision-language models with instruction tuning,” arXiv preprint arXiv:2305.06500, 2023.
- H. W. Chung, L. Hou, S. Longpre, B. Zoph, Y. Tay, W. Fedus, E. Li, X. Wang, M. Dehghani, S. Brahma et al., “Scaling instruction-finetuned language models,” arXiv preprint arXiv:2210.11416, 2022.
- F. Thabtah, S. Hammoud, F. Kamalov, and A. Gonsalves, “Data imbalance in classification: Experimental evaluation,” Information Sciences, vol. 513, pp. 429–441, 2020.
- L. Maier-Hein, M. Eisenmann, D. Sarikaya, K. März, T. Collins, A. Malpani, J. Fallert, H. Feussner, S. Giannarou, P. Mascagni et al., “Surgical data science–from concepts toward clinical translation,” Medical image analysis, vol. 76, p. 102306, 2022.
- Z.-L. Ni, G.-B. Bian, Z.-G. Hou, X.-H. Zhou, X.-L. Xie, and Z. Li, “Attention-guided lightweight network for real-time segmentation of robotic surgical instruments,” in 2020 ICRA. IEEE, 2020, pp. 9939–9945.
- J. Liu, X. Guo, and Y. Yuan, “Graph-based surgical instrument adaptive segmentation via domain-common knowledge,” IEEE Trans. Med. Imaging, vol. 41, no. 3, pp. 715–726, 2021.
- L. Bai, M. Islam, L. Seenivasan, and H. Ren, “Surgical-VQLA: Transformer with gated vision-language embedding for visual question localized-answering in robotic surgery,” arXiv preprint arXiv:2305.11692, 2023.
- E. Rojas-Muñoz, K. Couperus, and J. Wachs, “Daisi: database for AI surgical instruction,” arXiv preprint arXiv:2004.02809, 2020.
- S. W. Lei, D. Gao, J. Z. Wu, Y. Wang, W. Liu, M. Zhang, and M. Z. Shou, “Symbolic replay: Scene graph as prompt for continual learning on vqa task,” in 2023 AAAI, vol. 37, no. 1, 2023, pp. 1250–1259.
- S. Zhang and R. S. Sutton, “A deeper look at experience replay,” arXiv preprint arXiv:1712.01275, 2017.
- A. Chaudhry, P. K. Dokania, T. Ajanthan, and P. H. Torr, “Riemannian walk for incremental learning: Understanding forgetting and intransigence,” in 2018 ECCV, 2018, pp. 532–547.
- F. Huszár, “Note on the quadratic penalties in elastic weight consolidation,” Proc. Nat. Acad. Sci., vol. 115, no. 11, pp. E2496–E2497, 2018.
- Kexin Chen (23 papers)
- Yuyang Du (14 papers)
- Tao You (10 papers)
- Mobarakol Islam (65 papers)
- Ziyu Guo (49 papers)
- Yueming Jin (70 papers)
- Guangyong Chen (55 papers)
- Pheng-Ann Heng (196 papers)
- Yue Zhan (2 papers)
- Chang Han Low (5 papers)