Papers
Topics
Authors
Recent
2000 character limit reached

UMass-BioNLP at MEDIQA-M3G 2024: DermPrompt -- A Systematic Exploration of Prompt Engineering with GPT-4V for Dermatological Diagnosis (2404.17749v2)

Published 27 Apr 2024 in cs.AI and cs.CL

Abstract: This paper presents our team's participation in the MEDIQA-ClinicalNLP2024 shared task B. We present a novel approach to diagnosing clinical dermatology cases by integrating large multimodal models, specifically leveraging the capabilities of GPT-4V under a retriever and a re-ranker framework. Our investigation reveals that GPT-4V, when used as a retrieval agent, can accurately retrieve the correct skin condition 85% of the time using dermatological images and brief patient histories. Additionally, we empirically show that Naive Chain-of-Thought (CoT) works well for retrieval while Medical Guidelines Grounded CoT is required for accurate dermatological diagnosis. Further, we introduce a Multi-Agent Conversation (MAC) framework and show its superior performance and potential over the best CoT strategy. The experiments suggest that using naive CoT for retrieval and multi-agent conversation for critique-based diagnosis, GPT-4V can lead to an early and accurate diagnosis of dermatological conditions. The implications of this work extend to improving diagnostic workflows, supporting dermatological education, and enhancing patient care by providing a scalable, accessible, and accurate diagnostic tool.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Convolutional neural network assistance significantly improves dermatologists’ diagnosis of cutaneous tumours using clinical images. European Journal of Cancer, 169:156–165.
  2. Remote health diagnosis and monitoring in the time of covid-19. Physiological measurement, 41(10):10TR01.
  3. Expertise in nursing practice: Caring, clinical judgment, and ethics. Springer Publishing Company.
  4. A convolutional neural network trained with dermoscopic images performed on par with 145 dermatologists in a clinical melanoma image classification task. European journal of cancer, 111:148–154.
  5. Medblip: Bootstrapping language-image pre-training from 3d medical images and texts.
  6. Meditron-70b: Scaling medical pretraining for large language models.
  7. A deep learning architecture for image representation, visual interpretability and automated basal-cell carcinoma cancer detection. In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2013, pages 403–410, Berlin, Heidelberg. Springer Berlin Heidelberg.
  8. Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542:115–118.
  9. To generate or to retrieve? on the effectiveness of artificial contexts for medical open-domain question answering. arXiv preprint arXiv:2403.01924.
  10. Skin cancer classification using resnet. In 2020 IEEE 5th International Conference on Computing Communication and Automation (ICCCA), pages 536–541.
  11. Domain-specific language model pretraining for biomedical natural language processing.
  12. Pathologist-level classification of histopathological melanoma images with deep neural networks. European Journal of Cancer, 115:79–83.
  13. Aligner: Achieving efficient alignment through weak-to-strong correction. arXiv preprint arXiv:2402.02416.
  14. Recognizing basal cell carcinoma on smartphone-captured digital histopathology images with a deep neural network. British Journal of Dermatology, 182(3):754–762.
  15. Omar Khattab and Matei Zaharia. 2020. Colbert: Efficient and effective passage search via contextualized late interaction over bert. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, pages 39–48.
  16. Llava-med: Training a large language-and-vision assistant for biomedicine in one day.
  17. One is not enough: Multi-agent conversation framework enhances rare disease diagnostic capabilities of large language models.
  18. BioGPT: generative pre-trained transformer for biomedical text generation and mining. Briefings in Bioinformatics, 23(6). Bbac409.
  19. Medical school dermatology education: a scoping review. Clinical and Experimental Dermatology, 48(6):648–659.
  20. Towards accurate differential diagnosis with large language models. arXiv preprint arXiv:2312.00164.
  21. Performance of GPT-4 Vision on kidney pathology exam questions. American Journal of Clinical Pathology, page aqae030.
  22. Synfac-edit: Synthetic imitation edit feedback for factual alignment in clinical summarization.
  23. Capabilities of gpt-4 on medical challenge problems.
  24. Can generalist foundation models outcompete special-purpose tuning? case study in medicine.
  25. Dermacen analytica: A novel methodology integrating multi-modal large language models with machine learning in tele-dermatology.
  26. Grips: Gradient-free, edit-based instruction search for prompting large language models. arXiv preprint arXiv:2203.07281.
  27. Automatic prompt optimization with" gradient descent" and beam search. arXiv preprint arXiv:2305.03495.
  28. Sudha Rao and Joel Tetreault. 2018. Dear sir or madam, may i introduce the gyafc dataset: Corpus, benchmarks and metrics for formality style transfer. arXiv preprint arXiv:1803.06535.
  29. The probabilistic relevance framework: Bm25 and beyond. Foundations and Trends® in Information Retrieval, 3(4):333–389.
  30. Dermatologist-level classification of skin cancer using cascaded ensembling of convolutional neural network and handcrafted features based deep neural network. IEEE Access, 10:17920–17932.
  31. Large language models encode clinical knowledge.
  32. Large language models encode clinical knowledge. Nature, 620(7972):172–180.
  33. Towards expert-level medical question answering with large language models. arXiv preprint arXiv:2305.09617.
  34. Context generation improves open domain question answering. arXiv preprint arXiv:2210.06349.
  35. Chain-of-discussion: A multi-model framework for complex evidence-based question answering.
  36. Chain-of-discussion: A multi-model framework for complex evidence-based question answering. arXiv preprint arXiv:2402.16313.
  37. Towards conversational diagnostic ai. arXiv preprint arXiv:2401.05654.
  38. Skin diseases classification using deep leaning methods. Current Health Sciences Journal, 46:136 – 140.
  39. Autogen: Enabling next-gen llm applications via multi-agent conversation framework. arXiv preprint arXiv:2308.08155.
  40. Performance of multimodal gpt-4v on usmle with image: Potential for imaging diagnostic support with explanations. medRxiv.
  41. Do physicians know how to prompt? the need for automatic prompt optimization help in clinical note generation. arXiv preprint arXiv:2311.09684.
  42. Zonghai Yao and Hong Yu. 2021. Improving formality style transfer with context-aware rule injection. arXiv preprint arXiv:2106.00210.
  43. Generate rather than retrieve: Large language models are strong context generators. arXiv preprint arXiv:2209.10063.
  44. Merging generated and retrieved knowledge for open-domain qa. arXiv preprint arXiv:2310.14393.
  45. Skingpt-4: An interactive dermatology diagnostic system with visual large language model.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.