MediFact at MEDIQA-M3G 2024: Medical Question Answering in Dermatology with Multimodal Learning (2405.01583v1)
Abstract: The MEDIQA-M3G 2024 challenge necessitates novel solutions for Multilingual & Multimodal Medical Answer Generation in dermatology (wai Yim et al., 2024a). This paper addresses the limitations of traditional methods by proposing a weakly supervised learning approach for open-ended medical question-answering (QA). Our system leverages readily available MEDIQA-M3G images via a VGG16-CNN-SVM model, enabling multilingual (English, Chinese, Spanish) learning of informative skin condition representations. Using pre-trained QA models, we further bridge the gap between visual and textual information through multimodal fusion. This approach tackles complex, open-ended questions even without predefined answer choices. We empower the generation of comprehensive answers by feeding the ViT-CLIP model with multiple responses alongside images. This work advances medical QA research, paving the way for clinical decision support systems and ultimately improving healthcare delivery.
- Vqa-med: Overview of the medical visual question answering task at imageclef 2019. CLEF (working notes), 2(6).
- Overview of the mediqa 2019 shared task on textual inference, question entailment and question answering. In Proceedings of the 18th BioNLP Workshop and Shared Task, pages 370–379.
- Survey on svm and their application in image classification. International Journal of Information Technology, 13(5):1–11.
- Electra: Pre-training text encoders as discriminators rather than generators.
- Diogo Cortiz. 2022. Exploring transformers models for emotion recognition: A comparision of bert, distilbert, roberta, xlnet and electra. In Proceedings of the 2022 3rd International Conference on Control, Robotics and Intelligent System, pages 230–234.
- Hybrid approach for content-based image retrieval using vgg16 layered architecture and svm: an application of deep learning. SN Computer Science, 2(3):170.
- Peter Elsner. 2020. Teledermatology in the times of covid-19–a systematic review. JDDG: Journal Der Deutschen Dermatologischen Gesellschaft, 18(8):841–845.
- Towards transparency in dermatology image datasets with skin tone annotations by experts, crowds, and an algorithm. Proceedings of the ACM on Human-Computer Interaction, 6(CSCW2):1–26.
- Msq-biobert: Ambiguity resolution to enhance biobert medical question-answering. In Proceedings of the ACM Web Conference 2023, pages 4020–4028.
- Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding sharing. arXiv preprint arXiv:2111.09543.
- Review of teledermatology: lessons learned from the covid-19 pandemic. American Journal of Clinical Dermatology, 25(1):5–14.
- Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension.
- Supervision exists everywhere: A data efficient contrastive language-image pre-training paradigm. In International Conference on Learning Representations.
- Artificial intelligence in dermatology image analysis: current developments and future trends. Journal of clinical medicine, 11(22):6826.
- Medical visual question answering: A survey. Artificial Intelligence in Medicine, page 102611.
- Telemedicine technologies and applications in the era of covid-19 pandemic: A systematic review. Health informatics journal, 29(2):14604582231167431.
- A pragmatic assessment of google translate for emergency department instructions. Journal of General Internal Medicine, 36(11):3361–3365.
- Overview of the mediqa-m3g 2024 shared task on multilingual and multimodal medical answer generation. In Proceedings of the 6th Clinical Natural Language Processing Workshop, Mexico City, Mexico. Association for Computational Linguistics.
- Dermavqa: A multilingual visual question answering dataset for dermatology. CoRR.
- A-vit: Adaptive tokens for efficient vision transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10809–10818.
- Mutual attention inception network for remote sensing visual question answering. IEEE Transactions on Geoscience and Remote Sensing, 60:1–14.
- Covid-19 detection based on image regrouping and resnet-svm using chest x-ray images. Ieee Access, 9:81902–81912.