Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ChatCAD+: Towards a Universal and Reliable Interactive CAD using LLMs (2305.15964v5)

Published 25 May 2023 in cs.CV

Abstract: The integration of Computer-Aided Diagnosis (CAD) with LLMs presents a promising frontier in clinical applications, notably in automating diagnostic processes akin to those performed by radiologists and providing consultations similar to a virtual family doctor. Despite the promising potential of this integration, current works face at least two limitations: (1) From the perspective of a radiologist, existing studies typically have a restricted scope of applicable imaging domains, failing to meet the diagnostic needs of different patients. Also, the insufficient diagnostic capability of LLMs further undermine the quality and reliability of the generated medical reports. (2) Current LLMs lack the requisite depth in medical expertise, rendering them less effective as virtual family doctors due to the potential unreliability of the advice provided during patient consultations. To address these limitations, we introduce ChatCAD+, to be universal and reliable. Specifically, it is featured by two main modules: (1) Reliable Report Generation and (2) Reliable Interaction. The Reliable Report Generation module is capable of interpreting medical images from diverse domains and generate high-quality medical reports via our proposed hierarchical in-context learning. Concurrently, the interaction module leverages up-to-date information from reputable medical websites to provide reliable medical advice. Together, these designed modules synergize to closely align with the expertise of human medical professionals, offering enhanced consistency and reliability for interpretation and advice. The source code is available at https://github.com/zhaozh10/ChatCAD.

A Review of "ChatCAD+: Towards a Universal and Reliable Interactive CAD using LLMs"

The paper "ChatCAD+: Towards a Universal and Reliable Interactive CAD using LLMs" presents an intriguing approach to integrating LLMs into computer-aided diagnosis (CAD) systems. It focuses on overcoming existing limitations in merging LLMs with CAD, such as restricted imaging domain scope and inadequate medical expertise of LLMs. The proposed solution, ChatCAD+, emphasizes universality and reliability, situating itself as an interactive CAD system capable of generating dependable diagnostics and interacting effectively with medical professionals and patients.

One of the central components of ChatCAD+ includes the integration of multi-domain CAD models, which addresses the limitation of previous systems confined to specific imaging modalities. By incorporating a domain identification module, ChatCAD+ can process various medical images, identifying the appropriate CAD network tailored to a specific domain. This adaptability enhances the system's generalizability across diverse clinical environments. In particular, the use of BiomedCLIP for domain identification illustrates an effective adaptation of existing deep learning architectures to uniquely meet medical challenges.

Moreover, the system employs a hierarchical in-context learning mechanism to uplift the quality of report generation. The proposed structure retrieves and utilizes semantically similar reports as in-context examples for refining initial report drafts generated by LLMs. Through this dual-stage generation process—preliminary report generation followed by template-driven enhancement—ChatCAD+ substantially improves the coherency, relevance, and precision of its diagnostics compared to earlier models. The paper provides detailed numerical results confirming improvements in NLG metrics (BLEU, ROUGE-L, METEOR) and clinical efficacy metrics (precision, recall, F1-score), underscoring the practical advantages of their approach.

Furthermore, ChatCAD+ implements a robust LLM-based knowledge retrieval system for reliable interactions. This component leverages external databases, such as the Merck Manuals, to furnish diagnostically accurate and contextually aware responses to clinical queries. The adoption of LLM as a tool to recursively search relevant knowledge across hierarchically organized medical topics demonstrates a refined methodology for ensuring responses align with clinical realities, thus enhancing patient trust and understanding.

The implications of this research are substantial. Practically, ChatCAD+ increases the accessibility and reliability of automated medical consultation, potentially easing the workload of healthcare providers and enhancing patient self-care. Theoretically, it presents a viable framework for continuously integrating emerging medical knowledge into AI-driven diagnostic tools—aligning closely with the dynamic nature of clinical medicine. However, the reliance on external databases and the inherent requirement for substantial computational resources present areas for future development. Addressing these limitations could lead to broader adoption and integration into existing healthcare systems.

In conclusion, the paper establishes a convincing case for the deployment of specialized CAD models alongside LLMs to achieve both breadth and depth in medical diagnostics. With advancements in prompt designs and multi-module systems such as ChatCAD+, the intersection of LLMs and CAD is poised to transform digital healthcare delivery, allowing for more personalized and scalable medical diagnostics. Future directions may include expanding the range of integrated medical knowledge bases and improving the efficiency of model components to reduce computational overhead. These advancements will likely encourage further research and application in AI-driven healthcare solutions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. OpenAI. (2023) Chatgpt: Optimizing language models for dialogue. [Online]. Available: https://openai.com/blog/chatgpt/
  2. H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar, A. Rodriguez, A. Joulin, E. Grave, and G. Lample, “Llama: Open and efficient foundation language models,” arXiv preprint arXiv:2302.13971, 2023.
  3. O.-A. Contributors, “Open-Assistant,” https://github.com/LAION-AI/Open-Assistant, 2023.
  4. J. Qiu, L. Li, J. Sun, J. Peng, P. Shi, R. Zhang, Y. Dong, K. Lam, F. P.-W. Lo, B. Xiao et al., “Large ai models in health informatics: Applications, challenges, and the future,” arXiv preprint arXiv:2303.11568, 2023.
  5. Y. Huang, A. Gomaa, T. Weissmann, J. Grigo, H. B. Tkhayat, B. Frey, U. S. Gaipl, L. V. Distel, A. Maier, R. Fietkau et al., “Benchmarking chatgpt-4 on acr radiation oncology in-training exam (txit): Potentials and challenges for ai-assisted medical education and decision making in radiation oncology,” arXiv preprint arXiv:2304.11957, 2023.
  6. J. Holmes, Z. Liu, L. Zhang, Y. Ding, T. T. Sio, L. A. McGee, J. B. Ashman, X. Li, T. Liu, J. Shen et al., “Evaluating large language models on a highly-specialized topic, radiation oncology physics,” arXiv preprint arXiv:2304.01938, 2023.
  7. S. Biswas, “Chatgpt and the future of medical writing,” p. e223312, 2023.
  8. V. W. Xue, P. Lei, and W. C. Cho, “The potential impact of chatgpt in clinical and translational medicine,” Clinical and Translational Medicine, vol. 13, no. 3, 2023.
  9. Z. Zhuang, L. Si, S. Wang, K. Xuan, X. Ouyang, Y. Zhan, Z. Xue, L. Zhang, D. Shen, W. Yao et al., “Knee cartilage defect assessment by graph representation and surface convolution,” IEEE Transactions on Medical Imaging, 2022.
  10. Z. Cui, Y. Fang, L. Mei, B. Zhang, B. Yu, J. Liu, C. Jiang, Y. Sun, L. Ma, J. Huang et al., “A fully automatic ai system for tooth and alveolar bone segmentation from cone-beam ct images,” Nature communications, vol. 13, no. 1, p. 2096, 2022.
  11. S. Wang, X. Ouyang, T. Liu, Q. Wang, and D. Shen, “Follow my eye: Using gaze to supervise computer-aided diagnosis,” IEEE Transactions on Medical Imaging, 2022.
  12. Z. Chen, Y. Shen, Y. Song, and X. Wan, “Generating radiology reports via memory-driven transformer,” in Proceedings of the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Aug. 2021.
  13. W. Chen, Y. Liu, C. Wang, G. Li, J. Zhu, and L. Lin, “Visual-linguistic causal intervention for radiology report generation,” arXiv preprint arXiv:2303.09117, 2023.
  14. S. Wang, Z. Zhao, X. Ouyang, Q. Wang, and D. Shen, “Chatcad: Interactive computer-aided diagnosis on medical image using large language models,” arXiv preprint arXiv:2302.07257, 2023.
  15. L. Milecki, V. Kalogeiton, S. Bodard, D. Anglicheau, J.-M. Correas, M.-O. Timsit, and M. Vakalopoulou, “Medimp: Medical images and prompts for renal transplant representation learning,” arXiv preprint arXiv:2303.12445, 2023.
  16. C. Niu and G. Wang, “Ct multi-task learning with a large image-text (lit) model,” bioRxiv, pp. 2023–04, 2023.
  17. L. Yunxiang, L. Zihan, Z. Kai, D. Ruilong, and Z. You, “Chatdoctor: A medical chat model fine-tuned on llama model using medical domain knowledge,” arXiv preprint arXiv:2303.14070, 2023.
  18. H. Xiong, S. Wang, Y. Zhu, Z. Zhao, Y. Liu, Q. Wang, and D. Shen, “Doctorglm: Fine-tuning your chinese doctor is not a herculean task,” arXiv preprint arXiv:2304.01097, 2023.
  19. H. Wang, C. Liu, N. Xi, Z. Qiang, S. Zhao, B. Qin, and T. Liu, “Huatuo: Tuning llama model with chinese medical knowledge,” arXiv preprint arXiv:2304.06975, 2023.
  20. B. Keno K., H. Tianyu, and C. Shan, “medalpaca: Finetuned large language models for medical question answering,” https://github.com/kbressem/medAlpaca, 2023.
  21. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in neural information processing systems, 2017, pp. 5998–6008.
  22. T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al., “Language models are few-shot learners,” Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020.
  23. K. Singhal, S. Azizi, T. Tu, S. S. Mahdavi, J. Wei, H. W. Chung, N. Scales, A. Tanwani, H. Cole-Lewis, S. Pfohl et al., “Large language models encode clinical knowledge,” arXiv preprint arXiv:2212.13138, 2022.
  24. C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu, “Exploring the limits of transfer learning with a unified text-to-text transformer,” The Journal of Machine Learning Research, vol. 21, no. 1, pp. 5485–5551, 2020.
  25. J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C. H. So, and J. Kang, “Biobert: a pre-trained biomedical language representation model for biomedical text mining,” Bioinformatics, vol. 36, no. 4, pp. 1234–1240, 2020.
  26. Y. Gu, R. Tinn, H. Cheng, M. Lucas, N. Usuyama, X. Liu, T. Naumann, J. Gao, and H. Poon, “Domain-specific language model pretraining for biomedical natural language processing,” ACM Transactions on Computing for Healthcare (HEALTH), vol. 3, no. 1, pp. 1–23, 2021.
  27. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
  28. E. Alsentzer, J. R. Murphy, W. Boag, W.-H. Weng, D. Jin, T. Naumann, and M. McDermott, “Publicly available clinical bert embeddings,” arXiv preprint arXiv:1904.03323, 2019.
  29. T. H. Kung, M. Cheatham, A. Medinilla, ChatGPT, C. Sillos, L. De Leon, C. Elepano, M. Madriaga, R. Aggabao, G. Diaz-Candido et al., “Performance of chatgpt on usmle: Potential for ai-assisted medical education using large language models,” medRxiv, pp. 2022–12, 2022.
  30. A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al., “Learning transferable visual models from natural language supervision,” in International Conference on Machine Learning.   PMLR, 2021, pp. 8748–8763.
  31. X. Chen, X. Wang, S. Changpinyo, A. Piergiovanni, P. Padlewski, D. Salz, S. Goodman, A. Grycner, B. Mustafa, L. Beyer et al., “Pali: A jointly-scaled multilingual language-image model,” arXiv preprint arXiv:2209.06794, 2022.
  32. W. Wang, H. Bao, L. Dong, J. Bjorck, Z. Peng, Q. Liu, K. Aggarwal, O. K. Mohammed, S. Singhal, S. Som et al., “Image as a foreign language: Beit pretraining for all vision and vision-language tasks,” arXiv preprint arXiv:2208.10442, 2022.
  33. M. Tsimpoukelli, J. L. Menick, S. Cabi, S. Eslami, O. Vinyals, and F. Hill, “Multimodal few-shot learning with frozen language models,” Advances in Neural Information Processing Systems, vol. 34, pp. 200–212, 2021.
  34. J.-B. Alayrac, J. Donahue, P. Luc, A. Miech, I. Barr, Y. Hasson, K. Lenc, A. Mensch, K. Millican, M. Reynolds et al., “Flamingo: a visual language model for few-shot learning,” arXiv preprint arXiv:2204.14198, 2022.
  35. J. Li, D. Li, S. Savarese, and S. Hoi, “Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models,” arXiv preprint arXiv:2301.12597, 2023.
  36. C. Wu, S. Yin, W. Qi, X. Wang, Z. Tang, and N. Duan, “Visual chatgpt: Talking, drawing and editing with visual foundation models,” arXiv preprint arXiv:2303.04671, 2023.
  37. S. Liu, Z. Zeng, T. Ren, F. Li, H. Zhang, J. Yang, C. Li, J. Yang, H. Su, J. Zhu, and L. Zhang, “Grounding dino: Marrying dino with grounded pre-training for open-set object detection,” 2023.
  38. A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, P. Dollár, and R. Girshick, “Segment anything,” arXiv:2304.02643, 2023.
  39. J. L. Bentley, “Multidimensional binary search trees used for associative searching,” Communications of the ACM, vol. 18, no. 9, pp. 509–517, 1975.
  40. A. E. Johnson, T. J. Pollard, S. J. Berkowitz, N. R. Greenbaum, M. P. Lungren, C.-y. Deng, R. G. Mark, and S. Horng, “Mimic-cxr, a de-identified publicly available database of chest radiographs with free-text reports,” Scientific data, vol. 6, no. 1, p. 317, 2019.
  41. W. Ye, J. Yao, H. Xue, and Y. Li, “Weakly supervised lesion localization with probabilistic-cam pooling,” 2020.
  42. J. Irvin, P. Rajpurkar, M. Ko, Y. Yu, S. Ciurea-Ilcus, C. Chute, H. Marklund, B. Haghgoo, R. Ball, K. Shpanskaya et al., “Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison,” in Proceedings of the AAAI conference on artificial intelligence, vol. 33, no. 01, 2019, pp. 590–597.
  43. L. Mei, Y. Fang, Z. Cui, K. Deng, N. Wang, X. He, Y. Zhan, X. Zhou, M. Tonetti, and D. Shen, “Hc-net: Hybrid classification network for automatic periodontal disease diagnosis,” in Medical Image Computing and Computer Assisted Intervention–MICCAI 2023: 26th International Conference, Vancouver.   Springer, 2023.
  44. K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “Bleu: a method for automatic evaluation of machine translation,” in Proceedings of the 40th annual meeting of the Association for Computational Linguistics, 2002, pp. 311–318.
  45. S. Banerjee and A. Lavie, “Meteor: An automatic metric for mt evaluation with improved correlation with human judgments,” in Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, 2005, pp. 65–72.
  46. C.-Y. Lin, “Rouge: A package for automatic evaluation of summaries,” in Text summarization branches out, 2004, pp. 74–81.
  47. A. Smit, S. Jain, P. Rajpurkar, A. Pareek, A. Y. Ng, and M. P. Lungren, “Chexbert: combining automatic labelers and expert annotations for accurate radiology report labeling using bert,” arXiv preprint arXiv:2004.09167, 2020.
  48. langchain ChatGLM Contributors, “Chatglm application with local knowledge implementation,” https://github.com/imClumsyPanda/langchain-ChatGLM, 2023.
  49. M. Denkowski and A. Lavie, “Meteor 1.3: Automatic metric for reliable optimization and evaluation of machine translation systems,” in Proceedings of the Sixth Workshop on Statistical Machine Translation.   Edinburgh, Scotland: Association for Computational Linguistics, Jul. 2011, pp. 85–91. [Online]. Available: https://aclanthology.org/W11-2107
  50. K. Chaitanya, E. Erdil, N. Karani, and E. Konukoglu, “Contrastive learning of global and local features for medical image segmentation with limited annotations,” arXiv preprint arXiv:2006.10511, 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Zihao Zhao (42 papers)
  2. Sheng Wang (239 papers)
  3. Jinchen Gu (1 paper)
  4. Yitao Zhu (11 papers)
  5. Lanzhuju Mei (5 papers)
  6. Zixu Zhuang (6 papers)
  7. Zhiming Cui (34 papers)
  8. Qian Wang (453 papers)
  9. Dinggang Shen (153 papers)
Citations (25)
Github Logo Streamline Icon: https://streamlinehq.com