Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 189 tok/s
Gemini 2.5 Pro 53 tok/s Pro
GPT-5 Medium 36 tok/s Pro
GPT-5 High 36 tok/s Pro
GPT-4o 75 tok/s Pro
Kimi K2 160 tok/s Pro
GPT OSS 120B 443 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

A comprehensive and easy-to-use multi-domain multi-task medical imaging meta-dataset (MedIMeta) (2404.16000v1)

Published 24 Apr 2024 in cs.CV and cs.LG

Abstract: While the field of medical image analysis has undergone a transformative shift with the integration of machine learning techniques, the main challenge of these techniques is often the scarcity of large, diverse, and well-annotated datasets. Medical images vary in format, size, and other parameters and therefore require extensive preprocessing and standardization, for usage in machine learning. Addressing these challenges, we introduce the Medical Imaging Meta-Dataset (MedIMeta), a novel multi-domain, multi-task meta-dataset. MedIMeta contains 19 medical imaging datasets spanning 10 different domains and encompassing 54 distinct medical tasks, all of which are standardized to the same format and readily usable in PyTorch or other ML frameworks. We perform a technical validation of MedIMeta, demonstrating its utility through fully supervised and cross-domain few-shot learning baselines.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. “A dataset of microscopic peripheral blood cell images for development of automatic recognition systems” In Data in Brief 30, 2020, pp. 105474 DOI: 10.1016/j.dib.2020.105474
  2. “Meta-learning with differentiable closed-form solvers” In arXiv preprint arXiv:1805.08136, 2018
  3. “The liver tumor segmentation benchmark (lits)” In Medical Image Analysis 84 Elsevier, 2023, pp. 102680
  4. “Skin lesion analysis toward melanoma detection 2018: A challenge hosted by the international skin imaging collaboration (isic)” In arXiv preprint arXiv:1902.03368, 2019
  5. Guy Davidson and Michael C Mozer “Sequential mastery of multiple visual tasks: Networks naturally learn to learn and forget to forget” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 9282–9293
  6. “Dataset of breast ultrasound images” In Data in Brief 28, 2020, pp. 104863 DOI: 10.1016/j.dib.2019.104863
  7. “Comparing Transfer and Meta Learning Approaches on a Unified Few-Shot Classification Benchmark” arXiv, 2021 DOI: 10.48550/ARXIV.2104.02638
  8. Chelsea Finn, Pieter Abbeel and Sergey Levine “Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks” In Proceedings of the 34th International Conference on Machine Learning 70 PMLR, 2017, pp. 1126–1135
  9. “A broader study of cross-domain few-shot learning” In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVII 16, 2020, pp. 124–141 Springer
  10. “A Broader Study of Cross-Domain Few-Shot Learning” arXiv, 2019 DOI: 10.48550/ARXIV.1912.07200
  11. “Deep Residual Learning for Image Recognition” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016
  12. Jakob Nikolas Kather, Niels Halama and Alexander Marx “100,000 Histological Images Of Human Colorectal Cancer And Healthy Tissue” Type: dataset Zenodo, 2018 DOI: 10.5281/ZENODO.1214456
  13. “Seven-Point Checklist and Skin Lesion Classification Using Multitask Multimodal Neural Nets” In IEEE Journal of Biomedical and Health Informatics 23.2, 2019, pp. 538–546 DOI: 10.1109/JBHI.2018.2824327
  14. “Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning” In Cell 172.5, 2018, pp. 1122–1131.e9 DOI: 10.1016/j.cell.2018.02.010
  15. “Overcoming catastrophic forgetting in neural networks” In Proceedings of the national academy of sciences 114.13 National Acad Sciences, 2017, pp. 3521–3526
  16. Alex Krizhevsky “Learning multiple layers of features from tiny images”, 2009
  17. “Chákṣu: A glaucoma specific fundus image database” In Scientific data 10.1 Nature Publishing Group UK London, 2023, pp. 70
  18. Brenden M Lake, Ruslan Salakhutdinov and Joshua B Tenenbaum “Human-level concept learning through probabilistic program induction” In Science 350.6266 American Association for the Advancement of Science, 2015, pp. 1332–1338
  19. “A curated mammography data set for use in computer-aided detection and diagnosis research” In Scientific Data 4.1, 2017, pp. 170177 DOI: 10.1038/sdata.2017.177
  20. “Deepdrid: Diabetic retinopathy—grading and image quality estimation challenge” In Patterns 3.6 Elsevier, 2022, pp. 100512
  21. “Human-level recognition of blast cells in acute myeloid leukaemia with convolutional neural networks” In Nature Machine Intelligence 1.11, 2019, pp. 538–544 DOI: 10.1038/s42256-019-0101-9
  22. Boris Oreshkin, Pau Rodríguez López and Alexandre Lacoste “Tadam: Task dependent adaptive metric for improved few-shot learning” In Advances in neural information processing systems 31, 2018
  23. “Retinal fundus multi-disease image dataset (rfmid): A dataset for multi-disease detection research” In Data 6.2 MDPI, 2021, pp. 14
  24. Sylvestre-Alvise Rebuffi, Hakan Bilen and Andrea Vedaldi “Learning multiple visual domains with residual adapters” In Advances in neural information processing systems 30, 2017
  25. “Meta-Learning for Semi-Supervised Few-Shot Classification”, 2018 arXiv:1803.00676 [cs.LG]
  26. “ImageNet Large Scale Visual Recognition Challenge” In International Journal of Computer Vision (IJCV) 115.3, 2015, pp. 211–252 DOI: 10.1007/s11263-015-0816-y
  27. “Meta-dataset: A dataset of datasets for learning to learn from few examples” In arXiv preprint arXiv:1903.03096, 2019
  28. Philipp Tschandl, Cliff Rosendahl and Harald Kittler “The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions” In Scientific data 5.1 Nature Publishing Group, 2018, pp. 1–9
  29. “Meta-Album: Multi-domain Meta-Dataset for Few-Shot Image Classification” In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2022 URL: https://meta-album.github.io/
  30. “Matching Networks for One Shot Learning” In Advances in Neural Information Processing Systems 29 Curran Associates, Inc., 2016
  31. “ChestX-ray8: Hospital-Scale Chest X-Ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017
  32. Stefano Woerner “MedIMeta Experiments”, 2024 URL: https://github.com/StefanoWoerner/medimeta-experiments
  33. Stefano Woerner “MedIMeta for PyTorch”, 2024 URL: https://github.com/StefanoWoerner/medimeta-pytorch
  34. Stefano Woerner “TorchCross”, 2024 URL: https://github.com/StefanoWoerner/torchcross
  35. Stefano Woerner and Christian F. Baumgartner “Strategies for Meta-Learning with Diverse Tasks” In Medical Imaging with Deep Learning, 2022
  36. “MedIMeta Dataset Scripts”, 2024 URL: https://github.com/StefanoWoerner/medimeta-dataset-scripts
  37. Stefano Woerner, Arthur Jaques and Christian F. Baumgartner “A comprehensive and easy-to-use multi-domain multi-task medical imaging meta-dataset (MedIMeta)”, 2024 DOI: 10.5281/zenodo.7884735
  38. “Efficient multiple organ localization in CT image using 3D region proposal network” In IEEE transactions on medical imaging 38.8 IEEE, 2019, pp. 1885–1898
  39. “MedMNIST v2-A large-scale lightweight benchmark for 2D and 3D biomedical image classification” In Scientific Data 10.1 Nature Publishing Group UK London, 2023, pp. 41
  40. “A Large-scale Study of Representation Learning with the Visual Task Adaptation Benchmark” arXiv, 2019 DOI: 10.48550/ARXIV.1910.04867
Citations (2)

Summary

  • The paper introduces MedIMeta, a comprehensive meta-dataset consolidating 19 datasets across 10 domains to advance cross-domain few-shot learning in medical imaging.
  • It standardizes images to 224x224 pixels and provides pre-made training, validation, and test splits, simplifying dataset use for both single-task and multi-task setups.
  • Experimental results demonstrate robust performance in both fully supervised and few-shot learning scenarios, underscoring its potential for advancing diagnostic algorithms.

Comprehensive Analysis of the Medical Imaging Meta-Dataset (MedIMeta): Facilitating Cross-Domain Few-Shot Learning

Introduction

In addressing the intricate challenges of medical image analysis, the necessity for extensive, diverse, and well-annotated image datasets is paramount to enhancing diagnostic algorithms via ML. This paper introduces MedIMeta, a large multi-domain, multi-task meta-dataset aimed at advancing the development and assessment of ML models. Particularly, MedIMeta is designed to facilitate the exploration and benchmarking of cross-domain few-shot learning (CD-FSL) algorithms in medical imaging contexts.

MedIMeta Dataset Overview

  • MedIMeta amalgamates 19 distinct medical imaging datasets covering 10 different domains and includes 54 unique medical tasks.
  • Tasks range from diagnostic categories to auxiliary ones like gender prediction, supporting both single-task and multi-task training frameworks.
  • Standardized Image Size: All images are standardized to 224x224 pixels, aligning with common dimensions used in pre-trained models, thus obviating the need for additional preprocessing.
  • Accessibility: Accompanied by a Python package to facilitate straightforward data loading and utilization within PyTorch, enhancing usability for ML research.
  • Pre-made Data Splits: Promotes consistent benchmarking by providing predefined splits for training, validation, and test sets.

Comparative Context

Existing meta-datasets predominantly target non-medical applications with only a few incorporating medical images. MedIMeta uniquely provides a substantial number of medical tasks and supports multi-task learning setups within medical domains. This positions MedIMeta distinctively against other datasets like Meta-Dataset, VTAB, or MedMNIST v2, particularly in terms of domain variety and resolution quality.

Technical Validation

To authenticate the utility of MedIMeta, a series of experiments were carried out:

  1. Fully Supervised Baseline: Models trained on individual tasks within MedIMeta demonstrated solid performance, affirming the dataset's quality and robustness.
  2. Cross-Domain Few-Shot Learning (CD-FSL): Testing included CD-FSL techniques like ImageNet pre-training, multi-domain multi-task pre-training, and multi-domain multi-task MAML. Performance varied across tasks indicating diverse levels of complexity and difficulty inherent within and across these tasks.
  3. Performance Assessment: The models achieved notable AUROC values across most tasks, with detailed performance metrics documented that aid in indicating challenging areas within the dataset worth further exploration.

Implications and Future Work

MedIMeta's extensive task variety and domain coverage not only allow for advanced algorithm development but also invite research into generalizable models capable of CD-FSL. The detailed validation provides a benchmark for subsequent models and highlights the dataset's potential to test and improve the efficacy of algorithms in real-world scenarios. Future advancements might involve the integration of additional medical domains or newer tasks that could extend the dataset's applicability and relevance further.

Concluding Remarks

In conclusion, MedIMeta represents a significant stride toward enhancing the interoperability and efficacy of ML models in medical imaging. By facilitating access to a broad array of medical imaging tasks and fostering the development of advanced CD-FSL algorithms, MedIMeta serves as a crucial resource for researchers aiming to tackle the nuanced challenges within the field of medical image analysis.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 tweet and received 114 likes.

Upgrade to Pro to view all of the tweets about this paper: