Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MediFact at MEDIQA-M3G 2024: Medical Question Answering in Dermatology with Multimodal Learning (2405.01583v1)

Published 27 Apr 2024 in cs.CL, cs.AI, cs.CV, and cs.LG

Abstract: The MEDIQA-M3G 2024 challenge necessitates novel solutions for Multilingual & Multimodal Medical Answer Generation in dermatology (wai Yim et al., 2024a). This paper addresses the limitations of traditional methods by proposing a weakly supervised learning approach for open-ended medical question-answering (QA). Our system leverages readily available MEDIQA-M3G images via a VGG16-CNN-SVM model, enabling multilingual (English, Chinese, Spanish) learning of informative skin condition representations. Using pre-trained QA models, we further bridge the gap between visual and textual information through multimodal fusion. This approach tackles complex, open-ended questions even without predefined answer choices. We empower the generation of comprehensive answers by feeding the ViT-CLIP model with multiple responses alongside images. This work advances medical QA research, paving the way for clinical decision support systems and ultimately improving healthcare delivery.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. Vqa-med: Overview of the medical visual question answering task at imageclef 2019. CLEF (working notes), 2(6).
  2. Overview of the mediqa 2019 shared task on textual inference, question entailment and question answering. In Proceedings of the 18th BioNLP Workshop and Shared Task, pages 370–379.
  3. Survey on svm and their application in image classification. International Journal of Information Technology, 13(5):1–11.
  4. Electra: Pre-training text encoders as discriminators rather than generators.
  5. Diogo Cortiz. 2022. Exploring transformers models for emotion recognition: A comparision of bert, distilbert, roberta, xlnet and electra. In Proceedings of the 2022 3rd International Conference on Control, Robotics and Intelligent System, pages 230–234.
  6. Hybrid approach for content-based image retrieval using vgg16 layered architecture and svm: an application of deep learning. SN Computer Science, 2(3):170.
  7. Peter Elsner. 2020. Teledermatology in the times of covid-19–a systematic review. JDDG: Journal Der Deutschen Dermatologischen Gesellschaft, 18(8):841–845.
  8. Towards transparency in dermatology image datasets with skin tone annotations by experts, crowds, and an algorithm. Proceedings of the ACM on Human-Computer Interaction, 6(CSCW2):1–26.
  9. Msq-biobert: Ambiguity resolution to enhance biobert medical question-answering. In Proceedings of the ACM Web Conference 2023, pages 4020–4028.
  10. Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding sharing. arXiv preprint arXiv:2111.09543.
  11. Review of teledermatology: lessons learned from the covid-19 pandemic. American Journal of Clinical Dermatology, 25(1):5–14.
  12. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension.
  13. Supervision exists everywhere: A data efficient contrastive language-image pre-training paradigm. In International Conference on Learning Representations.
  14. Artificial intelligence in dermatology image analysis: current developments and future trends. Journal of clinical medicine, 11(22):6826.
  15. Medical visual question answering: A survey. Artificial Intelligence in Medicine, page 102611.
  16. Telemedicine technologies and applications in the era of covid-19 pandemic: A systematic review. Health informatics journal, 29(2):14604582231167431.
  17. A pragmatic assessment of google translate for emergency department instructions. Journal of General Internal Medicine, 36(11):3361–3365.
  18. Overview of the mediqa-m3g 2024 shared task on multilingual and multimodal medical answer generation. In Proceedings of the 6th Clinical Natural Language Processing Workshop, Mexico City, Mexico. Association for Computational Linguistics.
  19. Dermavqa: A multilingual visual question answering dataset for dermatology. CoRR.
  20. A-vit: Adaptive tokens for efficient vision transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10809–10818.
  21. Mutual attention inception network for remote sensing visual question answering. IEEE Transactions on Geoscience and Remote Sensing, 60:1–14.
  22. Covid-19 detection based on image regrouping and resnet-svm using chest x-ray images. Ieee Access, 9:81902–81912.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets