Free Form Medical Visual Question Answering in Radiology (2401.13081v1)
Abstract: Visual Question Answering (VQA) in the medical domain presents a unique, interdisciplinary challenge, combining fields such as Computer Vision, Natural Language Processing, and Knowledge Representation. Despite its importance, research in medical VQA has been scant, only gaining momentum since 2018. Addressing this gap, our research delves into the effective representation of radiology images and the joint learning of multimodal representations, surpassing existing methods. We innovatively augment the SLAKE dataset, enabling our model to respond to a more diverse array of questions, not limited to the immediate content of radiology or pathology images. Our model achieves a top-1 accuracy of 79.55\% with a less complex architecture, demonstrating comparable performance to current state-of-the-art models. This research not only advances medical VQA but also opens avenues for practical applications in diagnostic settings.
- Covidx cxr-2, a. URL https://www.kaggle.com/datasets/andyczhao/covidx-cxr2.
- Rsna pneumonia detection challenge, b. URL https://www.kaggle.com/competitions/rsna-pneumonia-detection-challenge/data.
- Medpix. URL https://medpix.nlm.nih.gov/home.
- Vqa. URL https://visualqa.org/index.html.
- Just at imageclef 2019 visual question answering in the medical domain. In CLEF (working notes), 2019.
- Deep neural networks and decision tree classifier for visual question answering in the medical domain. In CLEF (Working Notes), 2018.
- Publicly available clinical BERT embeddings. CoRR, abs/1904.03323, 2019. URL http://arxiv.org/abs/1904.03323.
- A sequence-to-sequence model approach for imageclef 2018 medical domain visual question answering. In 2018 15th IEEE India Council International Conference (INDICON), pp. 1–6. IEEE, 2018.
- Overview of the vqa-med task at imageclef 2021: Visual question answering and generation in the medical domain. In Proceedings of the CLEF 2021 Conference and Labs of the Evaluation Forum-working notes. 21-24 September 2021, 2021.
- PadChest: A large chest x-ray image dataset with multi-label annotated reports. Medical Image Analysis, 66:101797, dec 2020. doi: 10.1016/j.media.2020.101797. URL https://doi.org/10.1016%2Fj.media.2020.101797.
- Hcp-mic at vqa-med 2020: Effective visual representation for medical visual question answering. In CLEF (Working Notes), 2020.
- TorchXRayVision: A library of chest X-ray datasets and models. In Medical Imaging with Deep Learning, 2022. URL https://github.com/mlmed/torchxrayvision.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. Ieee, 2009.
- Does clip benefit visual question answering in the medical domain as much as it does in the general domain?, 2021.
- Cross-modal self-attention with multi-task pre-training for medical visual question answering. In Proceedings of the 2021 international conference on multimedia retrieval, pp. 456–460, 2021.
- Hierarchical deep multi-modal network for medical visual question answering. Expert Systems with Applications, 164:113993, 2021a.
- Hierarchical deep multi-modal network for medical visual question answering. Expert Systems with Applications, 164:113993, 2021b.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
- Rethinking imagenet pre-training, 2018.
- Pathvqa: 30000+ questions for medical visual question answering. arXiv preprint arXiv:2003.10286, 2020.
- Densely connected convolutional networks, 2018.
- Clinicalbert: Modeling clinical notes and predicting hospital readmission. arXiv preprint arXiv:1904.05342, 2019.
- Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison. CoRR, abs/1901.07031, 2019. URL http://arxiv.org/abs/1901.07031.
- Mimic-cxr-jpg, a large publicly available database of labeled chest radiographs, 2019.
- bumjun_jung at vqa-med 2020: Vqa model based on feature extraction and multi-modal feature fusion. In CLEF (Working Notes), 2020.
- CHAOS challenge - combined (CT-MR) healthy abdominal organ segmentation. Medical Image Analysis, 69:101950, apr 2021. doi: 10.1016/j.media.2020.101950. URL https://doi.org/10.1016%2Fj.media.2020.101950.
- Bilinear attention networks. Advances in neural information processing systems, 31, 2018.
- Visual dialog for radiology: Data curation and firststeps. In ViGIL@NeurIPS, 2019.
- A dataset of clinically generated visual questions and answers about radiology images. Scientific data, 5(1):1–10, 2018.
- Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4):1234–1240, 2020.
- Tam at vqa-med 2021: A hybrid model with feature extraction and fusion for medical visual question answering. In CLEF (Working Notes), pp. 1295–1304, 2021.
- Medical visual question answering: A survey. arXiv preprint arXiv:2111.10056, 2021.
- Medical visual question answering: A survey. Artificial Intelligence in Medicine, pp. 102611, 2023.
- Contrastive pre-training and representation distillation for medical visual question answering based on radiology images. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part II 24, pp. 210–220. Springer, 2021a.
- Contrastive pre-training and representation distillation for medical visual question answering based on radiology images. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part II 24, pp. 210–220. Springer, 2021b.
- Contrastive pre-training and representation distillation for medical visual question answering based on radiology images. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part II 24, pp. 210–220. Springer, 2021c.
- Slake: A semantically-labeled knowledge-enhanced dataset for medical visual question answering. In 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI), pp. 1650–1654. IEEE, 2021d.
- Hierarchical question-image co-attention for visual question answering. Advances in neural information processing systems, 29, 2016.
- Radimagenet: An open radiologic deep learning research dataset for effective transfer learning. Radiology: Artificial Intelligence, 4(5):e210315, 2022. doi: 10.1148/ryai.210315. URL https://doi.org/10.1148/ryai.210315.
- of Health Chest X-Ray Dataset, N. I. Nih chest x-rays, Feb 2018. URL https://www.kaggle.com/datasets/nih-chest-xrays/data.
- Radiology objects in context (roco): A multimodal image dataset. In Stoyanov, D., Taylor, Z., Balocco, S., Sznitman, R., Martel, A., Maier-Hein, L., Duong, L., Zahnd, G., Demirci, S., Albarqouni, S., Lee, S.-L., Moriconi, S., Cheplygina, V., Mateus, D., Trucco, E., Granger, E., and Jannin, P. (eds.), Intravascular Imaging and Computer Assisted Stenting and Large-Scale Annotation of Biomedical Data and Expert Label Synthesis, pp. 180–189, Cham, 2018. Springer International Publishing. ISBN 978-3-030-01364-6.
- Raddar. Chest x-rays (indiana university), Feb 2020. URL https://www.kaggle.com/datasets/raddar/chest-xrays-indiana-university.
- Learning transferable visual models from natural language supervision, 2021.
- Transfusion: Understanding transfer learning for medical imaging, 2019.
- Medfusenet: An attention-based multimodal deep learning model for visual question answering in the medical domain. Scientific Reports, 11(1):19826, 2021a.
- Multi-modal learning from video, eye tracking, and pupillometry for operator skill characterization in clinical fetal ultrasound. In 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI), pp. 1646–1649. IEEE, 2021b.
- How much can CLIP benefit vision-and-language tasks? CoRR, abs/2107.06383, 2021. URL https://arxiv.org/abs/2107.06383.
- Very deep convolutional networks for large-scale image recognition. arXiv 1409.1556, 09 2014.
- A large annotated medical image dataset for the development and evaluation of segmentation algorithms, 2019.
- Efficient representation learning for healthcare with cross-architectural self-supervision. In Deshpande, K., Fiterau, M., Joshi, S., Lipton, Z., Ranganath, R., Urteaga, I., and Yeung, S. (eds.), Proceedings of the 8th Machine Learning for Healthcare Conference, volume 219 of Proceedings of Machine Learning Research, pp. 691–711. PMLR, 11–12 Aug 2023. URL https://proceedings.mlr.press/v219/singh23a.html.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- A question-centric model for visual question answering in medical imaging. IEEE transactions on medical imaging, 39(9):2856–2868, 2020.
- ChestX-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, jul 2017a. doi: 10.1109/cvpr.2017.369. URL https://doi.org/10.1109%2Fcvpr.2017.369.
- Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2097–2106, 2017b.
- Medclip: Contrastive learning from unpaired medical images and text. arXiv preprint arXiv:2210.10163, 2022.
- Multi-modal factorized bilinear pooling with co-attention learning for visual question answering. In Proceedings of the IEEE international conference on computer vision, pp. 1821–1830, 2017.
- Beyond bilinear: Generalized multimodal factorized high-order pooling for visual question answering. IEEE transactions on neural networks and learning systems, 29(12):5947–5959, 2018.
- A survey on negative transfer. IEEE/CAA Journal of Automatica Sinica, 10(2):305–329, feb 2023. doi: 10.1109/jas.2022.106004. URL https://doi.org/10.1109%2Fjas.2022.106004.
- Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B: Statistical Methodology, 67(2):301–320, 2005.
- Abhishek Narayanan (1 paper)
- Rushabh Musthyala (3 papers)
- Rahul Sankar (2 papers)
- Anirudh Prasad Nistala (1 paper)
- Pranav Singh (24 papers)
- Jacopo Cirrone (9 papers)