Papers
Topics
Authors
Recent
2000 character limit reached

ROCOv2: Radiology Objects in COntext Version 2, an Updated Multimodal Image Dataset (2405.10004v2)

Published 16 May 2024 in eess.IV, cs.CV, and cs.LG

Abstract: Automated medical image analysis systems often require large amounts of training data with high quality labels, which are difficult and time consuming to generate. This paper introduces Radiology Object in COntext version 2 (ROCOv2), a multimodal dataset consisting of radiological images and associated medical concepts and captions extracted from the PMC Open Access subset. It is an updated version of the ROCO dataset published in 2018, and adds 35,705 new images added to PMC since 2018. It further provides manually curated concepts for imaging modalities with additional anatomical and directional concepts for X-rays. The dataset consists of 79,789 images and has been used, with minor modifications, in the concept detection and caption prediction tasks of ImageCLEFmedical Caption 2023. The dataset is suitable for training image annotation models based on image-caption pairs, or for multi-label image classification using Unified Medical Language System (UMLS) concepts provided with each image. In addition, it can serve for pre-training of medical domain models, and evaluation of deep learning models for multi-task learning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (61)
  1. Radiology Objects in COntext (ROCO): A multimodal image dataset. In Proceedings of the Third International Workshop on Large-Scale Annotation of Biomedical Data and Expert Label Synthesis (LABELS 2018), Held in Conjunction with MICCAI 2018, vol. 11043, 180–189 (LNCS Lecture Notes in Computer Science, Springer, Granada, Spain, 2018).
  2. National Library of Medicine. PMC Open Access Subset (2003). Dataset, https://www.ncbi.nlm.nih.gov/pmc/tools/openftlist/ (accessed 2024-03-12).
  3. Overview of the ImageCLEFmed 2019 concept detection task. In Working Notes of Conference and Labs of the Evaluation Forum (CLEF 2019), vol. 2380 of CEUR Workshop Proceedings (CEUR-WS.org, 2019).
  4. Overview of the ImageCLEFmed 2020 concept prediction task: Medical image understanding. In Working Notes of Conference and Labs of the Evaluation Forum (CLEF 2020), vol. 2696 of CEUR Workshop Proceedings (CEUR-WS.org, 2020).
  5. Pelka, O. et al. Overview of the ImageCLEFmed 2021 concept & caption prediction task. In Working Notes of Conference and Labs of the Evaluation Forum (CLEF 2021), vol. 2936 of CEUR Workshop Proceedings, 1101–1112 (CEUR-WS.org, 2021).
  6. Rückert, J. et al. Overview of ImageCLEFmedical 2022 – Caption Prediction and Concept Detection. In CLEF2022 Working Notes, vol. 3180 of CEUR Workshop Proceedings, 1294–1307 (CEUR-WS.org, Bologna, Italy, 2022).
  7. Experiences from the ImageCLEF Medical Retrieval and Annotation Tasks, 231–250 (Springer International Publishing, Cham, 2019).
  8. Kraljevic, Z. et al. Multi-domain clinical natural language processing with MedCAT: The medical concept annotation toolkit. \JournalTitleArtificial Intelligence in Medicine 117, 102083, https://doi.org/10.1016/j.artmed.2021.102083 (2021).
  9. QuickUMLS: A fast, unsupervised approach for medical concept extraction. In Medical Information Retrieval (MedIR) Workshop, Special Interest Group on Information Retrieval (SIGIR) 2016, 4 (Pisa, Italy, 2016).
  10. PubMedCLIP: How much does CLIP benefit visual question answering in the medical domain? In Findings of the Association for Computational Linguistics: EACL 2023, 1181–1193 (Association for Computational Linguistics, Dubrovnik, Croatia, 2023).
  11. Radford, A. et al. Learning transferable visual models from natural language supervision. In Meila, M. & Zhang, T. (eds.) Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, vol. 139 of Proceedings of Machine Learning Research, 8748–8763 (PMLR, 2021).
  12. Johnson, A. E. W. et al. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. \JournalTitleScientific Data 6, 10.1038/s41597-019-0322-0 (2019).
  13. Demner-Fushman, D. et al. Preparing a collection of radiology examinations for distribution and retrieval. \JournalTitleJournal of the American Medical Informatics Association 23, 304–310, 10.1093/jamia/ocv080 (2015).
  14. PadChest: A large chest x-ray image dataset with multi-label annotated reports. \JournalTitleMedical Image Analysis 66, 101797, https://doi.org/10.1016/j.media.2020.101797 (2020).
  15. Bodenreider, O. The unified medical language system (UMLS): integrating biomedical terminology. \JournalTitleNucleic Acids Research 32, 267–270, 10.1093/nar/gkh061 (2004).
  16. Subramanian, S. et al. MedICaT: A dataset of medical images, captions, and textual references. In Findings of the Association for Computational Linguistics: EMNLP 2020, 2112–2120, 10.18653/v1/2020.findings-emnlp.191 (Association for Computational Linguistics, Online, 2020).
  17. Lin, W. et al. PMC-CLIP: Contrastive language-image pre-training using biomedical documents. In Proceedings of the Medical Image Computing and Computer Assisted Intervention (MICCAI 2023), 525 – 536, 10.1007/978-3-031-43993-3_51 (2023).
  18. MedMNIST classification decathlon: A lightweight AutoML benchmark for medical image analysis. In 18th IEEE International Symposium on Biomedical Imaging, ISBI 2021, Nice, France, April 13-16, 2021, 191–195, 10.1109/ISBI48211.2021.9434062 (IEEE, 2021).
  19. Zhang, X. et al. PMC-VQA: visual instruction tuning for medical visual question answering. \JournalTitleCoRR abs/2305.10415, 10.48550/arXiv.2305.10415 (2023). 2305.10415.
  20. A dataset of clinically generated visual questions and answers about radiology images. \JournalTitleScientific Data 5, 10.1038/sdata.2018.251 (2018).
  21. Liu, B. et al. SLAKE: A semantically-labeled knowledge-enhanced dataset for medical visual question answering. In 18th IEEE International Symposium on Biomedical Imaging, ISBI 2021, Nice, France, April 13-16, 2021, 1650–1654, 10.1109/ISBI48211.2021.9434010 (IEEE, 2021).
  22. Moor, M. et al. Med-Flamingo: a multimodal medical few-shot learner. \JournalTitleCoRR abs/2307.15189, 10.48550/arXiv.2307.15189 (2023). 2307.15189.
  23. Awadalla, A. et al. OpenFlamingo: An open-source framework for training large autoregressive vision-language models. \JournalTitleCoRR abs/2308.01390, 10.48550/arXiv.2308.01390 (2023). 2308.01390.
  24. The IRMA code for unique classification of medical images. In Huang, H. K. & Ratib, O. M. (eds.) SPIE Proceedings, 10.1117/12.480677 (SPIE, 2003).
  25. Rückert, J. et al. ROCOv2: Radiology Objects in COntext Version 2, An Updated Multimodal Image Dataset, 10.5281/zenodo.10821435 (2023). Dataset, Zenodo.
  26. Bag of tricks for efficient text classification. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, 427–431 (Association for Computational Linguistics, Valencia, Spain, 2017).
  27. Johnson, A. E. et al. MIMIC-III, a freely accessible critical care database. \JournalTitleScientific Data 3, 10.1038/sdata.2016.35 (2016).
  28. Pelvic girdle pain, hypermobility spectrum disorder and hypermobility-type ehlers-danlos syndrome: A narrative literature review. \JournalTitleJournal of Clinical Medicine 9, 10.3390/jcm9123992 (2020).
  29. Rückert, J. et al. Overview of ImageCLEFmedical 2023 – caption prediction and concept detection. In CLEF2023 Working Notes, vol. 3497 of CEUR Workshop Proceedings (CEUR-WS.org, Thessaloniki, Greece, 2023).
  30. Ionescu, B. et al. Overview of ImageCLEF 2023: Multimedia retrieval in medical, socialmedia and recommender systems applications. In Experimental IR Meets Multilinguality, Multimodality, and Interaction, Proceedings of the 14th International Conference of the CLEF Association (CLEF 2023) (Springer Lecture Notes in Computer Science LNCS, Thessaloniki, Greece, 2023).
  31. AUEB NLP group at ImageCLEFmedical caption 2023. In CLEF2023 Working Notes, vol. 3497 of CEUR Workshop Proceedings, 1524–1548 (CEUR-WS.org, Thessaloniki, Greece, 2023).
  32. Shinoda, H. et al. KDE lab at ImageCLEFmedical caption 2023. In CLEF2023 Working Notes, vol. 3497 of CEUR Workshop Proceedings, 1688–1701 (CEUR-WS.org, Thessaloniki, Greece, 2023).
  33. Detecting concepts and generating captions from medical images: Contributions of the VCMI team to ImageCLEFmedical caption 2023. In CLEF2023 Working Notes, vol. 3497 of CEUR Workshop Proceedings, 1653–1667 (CEUR-WS.org, Thessaloniki, Greece, 2023).
  34. IUST_NLPLAB at ImageCLEFmedical caption tasks 2023. In CLEF2023 Working Notes, vol. 3497 of CEUR Workshop Proceedings, 1558–1570 (CEUR-WS.org, Thessaloniki, Greece, 2023).
  35. Concept detection and image caption generation in medical imaging. In CLEF2023 Working Notes, vol. 3497 of CEUR Workshop Proceedings, 1767–1775 (CEUR-WS.org, Thessaloniki, Greece, 2023).
  36. Concept detection and caption prediction in ImageCLEFmedical caption 2023 with convolutional neural networks, vision and text-to-text transfer transformers. In CLEF2023 Working Notes, vol. 3497 of CEUR Workshop Proceedings, 1510–1523 (CEUR-WS.org, Thessaloniki, Greece, 2023).
  37. SSN MLRG at caption 2023: Automatic concept detection and caption prediction using ConceptNet and vision transformer. In CLEF2023 Working Notes, vol. 3497 of CEUR Workshop Proceedings, 1620–1626 (CEUR-WS.org, Thessaloniki, Greece, 2023).
  38. Zhou, W. et al. Transferring pre-trained large language-image model for medical image captioning. In CLEF2023 Working Notes, vol. 3497 of CEUR Workshop Proceedings, 1776–1784 (CEUR-WS.org, Thessaloniki, Greece, 2023).
  39. EfficientNet: Rethinking model scaling for convolutional neural networks. In International Conference on Machine Learning (ICML), 6105 – 6114 (2019).
  40. EfficientNetV2: Smaller models and faster training. In International Conference on Machine Learning (ICML), 10096 – 10106 (2021).
  41. Paszke, A. et al. PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems (NeuRIPS) 32, 8024–8035 (Curran Associates, Inc., 2019).
  42. Merkel, D. Docker: Lightweight Linux containers for consistent development and deployment. \JournalTitleLinux journal 2014, 2 (2014).
  43. Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR) (2015).
  44. Micikevicius, P. et al. Mixed precision training. In Proceedings of the 6th International Conference on Learning Representations, (ICLR 2018) (OpenReview.net, 2018).
  45. Deng, J. et al. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, 248–255, 10.1109/CVPR.2009.5206848 (2009).
  46. ImageNet-21K pretraining for the masses. In Vanschoren, J. & Yeung, S. (eds.) Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021, December 2021, virtual (2021).
  47. BERTScore: Evaluating text generation with BERT. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020 (2020).
  48. Lin, C.-Y. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, 74–81 (Association for Computational Linguistics, 2004).
  49. Meteor universal: Language specific translation evaluation for any target language. In Proceedings of the Ninth Workshop on Statistical Machine Translation, 376–380, 10.3115/v1/W14-3348 (Association for Computational Linguistics, 2014).
  50. CIDEr: Consensus-based image description evaluation. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 4566–4575, 10.1109/CVPR.2015.7299087 (IEEE, 2015).
  51. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, 311–318 (2002).
  52. BLEURT: Learning robust metrics for text generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 7881–7892, 10.18653/v1/2020.acl-main.704 (Association for Computational Linguistics, Online, 2020).
  53. CLIPScore: A reference-free evaluation metric for image captioning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 7514–7528, 10.18653/v1/2021.emnlp-main.595 (Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 2021).
  54. A concise model for medical image captioning. In CLEF2023 Working Notes, vol. 3497 of CEUR Workshop Proceedings, 1611–1619 (CEUR-WS.org, Thessaloniki, Greece, 2023).
  55. PCLmed at ImageCLEFmedical 2023: Customizing general-purpose foundation models for medical report generation. In CLEF2023 Working Notes, vol. 3497 of CEUR Workshop Proceedings, 1754–1766 (CEUR-WS.org, Thessaloniki, Greece, 2023).
  56. Aono, M. et al. Multi-stage medical image captioning using classification and CLIP. In CLEF2023 Working Notes, vol. 3497 of CEUR Workshop Proceedings, 1387–1395 (CEUR-WS.org, Thessaloniki, Greece, 2023).
  57. Show and tell: A neural image caption generator. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015, 3156–3164, 10.1109/CVPR.2015.7298935 (IEEE Computer Society, 2015).
  58. Dosovitskiy, A. et al. An image is worth 16x16 words: Transformers for image recognition at scale. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021 (OpenReview.net, 2021).
  59. Radford, A. et al. Language models are unsupervised multitask learners (2019).
  60. Cohen, J. A coefficient of agreement for nominal scales. \JournalTitleEducational and Psychological Measurement 20, 37–46, 10.1177/001316446002000104 (1960).
  61. The measurement of observer agreement for categorical data. \JournalTitleBiometrics 33, 159, 10.2307/2529310 (1977).
Citations (18)

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.