Rethinking Model Prototyping through the MedMNIST+ Dataset Collection (2404.15786v2)
Abstract: The integration of deep learning based systems in clinical practice is often impeded by challenges rooted in limited and heterogeneous medical datasets. In addition, prioritization of marginal performance improvements on a few, narrowly scoped benchmarks over clinical applicability has slowed down meaningful algorithmic progress. This trend often results in excessive fine-tuning of existing methods to achieve state-of-the-art performance on selected datasets rather than fostering clinically relevant innovations. In response, this work presents a comprehensive benchmark for the MedMNIST+ database to diversify the evaluation landscape and conduct a thorough analysis of common convolutional neural networks (CNNs) and Transformer-based architectures, for medical image classification. Our evaluation encompasses various medical datasets, training methodologies, and input resolutions, aiming to reassess the strengths and limitations of widely used model variants. Our findings suggest that computationally efficient training schemes and modern foundation models hold promise in bridging the gap between expensive end-to-end training and more resource-refined approaches. Additionally, contrary to prevailing assumptions, we observe that higher resolutions may not consistently improve performance beyond a certain threshold, advocating for the use of lower resolutions, particularly in prototyping stages, to expedite processing. Notably, our analysis reaffirms the competitiveness of convolutional models compared to ViT-based architectures emphasizing the importance of comprehending the intrinsic capabilities of different model architectures. Moreover, we hope that our standardized evaluation framework will help enhance transparency, reproducibility, and comparability on the MedMNIST+ dataset collection as well as future research within the field. Code is available at https://github.com/sdoerrich97 .
- Trends in using deep learning algorithms in biomedical prediction systems. Frontiers in Neuroscience, 17, 2023. ISSN 1662453X.
- Attention is all you need. In Neural Information Processing Systems, 2017.
- An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021.
- Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, 2021.
- Emerging properties in self-supervised vision transformers. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 9630–9640, 2021.
- Dinov2: Learning robust visual features without supervision. 2024.
- A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. The Lancet Digital Health, 1(6):e271–e297, 2019. ISSN 2589-7500.
- A path for translation of machine learning products into healthcare delivery. EMJ Innovations, 2020.
- Trends and statistics of artificial intelligence and radiomics research in radiology, nuclear medicine, and medical imaging: bibliometric analysis. European Radiology, 33:7542–7555, 2023. ISSN 14321084. doi: 10.1007/S00330-023-09772-0/FIGURES/6.
- Measuring domain shift for deep learning in histopathology. IEEE Journal of Biomedical and Health Informatics, 25:325–336, 2021. ISSN 21682208.
- Domain-adversarial neural networks to address the appearance variability of histopathology images. Lecture Notes in Computer Science, 10553 LNCS:83–91, 2017. ISSN 16113349.
- Deep learning-based detection and correction of cardiac mr motion artefacts during reconstruction for high-quality segmentation. IEEE Transactions on Medical Imaging, 39:4001–4010, 2020. ISSN 1558254X.
- Impact of scanner variability on lymph node segmentation in computational pathology. Journal of Pathology Informatics, 13:100127, 2022. ISSN 2153-3539.
- Learning invariant representations and risks for semi-supervised domain adaptation. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 1104–1113, 2021. ISSN 10636919.
- Learning to generalize: Meta-learning for domain generalization. Proceedings of the AAAI Conference on Artificial Intelligence, 32:3490–3497, 2018. ISSN 2374-3468.
- Toward generalizability in the deployment of artificial intelligence in radiology: Role of computation stress testing to overcome underspecification. Radiology: Artificial Intelligence, 3, 2021. ISSN 26386100.
- AI and the everything in the whole wide world benchmark. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021.
- Machine learning for medical imaging: methodological failures and recommendations for the future. npj Digital Medicine 2022 5:1, 5:1–8, 2022. ISSN 2398-6352.
- Machine learning and deep learning. Electronic Markets, 31:685–695, 2021. ISSN 14228890.
- Reduced, reused and recycled: The life of a dataset in machine learning research. In J. Vanschoren and S. Yeung, editors, Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, volume 1. Curran, 2021.
- Excavating ai: the politics of images in machine learning training sets. AI and Society, 36:1105–1116, 2021. ISSN 14355655.
- A. Birhane and V. Prabhu. Large image datasets: A pyrrhic win for computer vision? In 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1536–1546, Los Alamitos, CA, USA, jan 2021. IEEE Computer Society.
- Addressing bias in big data and ai for health care: A call for open science. Patterns, 2:100347, 2021. ISSN 26663899.
- Scaling language models: Methods, analysis & insights from training gopher. ArXiv, abs/2112.11446, 2021.
- Lamda: Language models for dialog applications, 2022.
- The inverse scaling prize, 2022. URL https://github.com/inverse-scaling/prize.
- Compute trends across three eras of machine learning. In 2022 International Joint Conference on Neural Networks (IJCNN), pages 1–8, 2022. doi: 10.1109/IJCNN55064.2022.9891914.
- Anirudh Goyal and Y. Bengio. Inductive biases for deep learning of higher-level cognition. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 478, 2022. doi: 10.1098/rspa.2021.0068.
- On the opportunities and risks of foundation models. 2021.
- Segment anything. 2023.
- Imagebind: One embedding space to bind them all, 2023.
- Medmnist v2-a large-scale lightweight benchmark for 2d and 3d biomedical image classification. Scientific Data, 10(1):41, 2023.
- Li Deng. The mnist database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Processing Magazine, 29(6):141–142, 2012.
- Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2014.
- Imagenet classification with deep convolutional neural networks. In F. Pereira, C.J. Burges, L. Bottou, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc., 2012.
- Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2015.
- Densely connected convolutional networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2261–2269, 2016.
- Efficientnet: Rethinking model scaling for convolutional neural networks. ArXiv, abs/1905.11946, 2019.
- Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115:211 – 252, 2014.
- Eva-02: A visual representation for neon genesis. 2023.
- Ross Wightman. Pytorch image models. https://github.com/huggingface/pytorch-image-models, 2019.
- Revisiting a knn-based image classification system with high-capacity storage. In Computer Vision – ECCV 2022, pages 457–474. Springer Nature Switzerland, 2022. ISBN 978-3-031-19836-6.
- Integrating knn with foundation models for adaptable and privacy-aware image classification, 2024.
- Decoupled weight decay regularization. In International Conference on Learning Representations, 2017.
- Sgdr: Stochastic gradient descent with restarts. ArXiv, abs/1608.03983, 2016.
- Detecting corrupted labels without training a model to predict. In International Conference on Machine Learning, 2021.
- Mahdi Hashemi. Enlarging smaller images before inputting into convolutional neural network: zero-padding vs. interpolation. Journal of Big Data, 6:1–13, 2019. ISSN 21961115.
- A dataset of microscopic peripheral blood cell images for development of automatic recognition systems. Data in Brief, 30:105474, 2020. ISSN 2352-3409.
- Dataset of breast ultrasound images. Data in Brief, 28:104863, 2020. ISSN 2352-3409.
- Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3462–3471, 2017.
- The ham10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Scientific Data 2018 5:1, 5:1–9, 2018. ISSN 2052-4463.
- Skin lesion analysis toward melanoma detection 2018: A challenge hosted by the international skin imaging collaboration (isic). 2019.
- Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell, 172:1122–1131.e9, 2018. ISSN 0092-8674.
- The liver tumor segmentation benchmark (lits). Medical Image Analysis, 84:102680, 2023. ISSN 1361-8415.
- Efficient multiple organ localization in ct image using 3d region proposal network. IEEE Transactions on Medical Imaging, 38(8):1885–1898, 2019.
- Predicting survival from colorectal cancer histology slides using deep learning: A retrospective multicenter study. PLOS Medicine, 16:e1002730, 2019. ISSN 1549-1676.
- Deepdrid: Diabetic retinopathy—grading and image quality estimation challenge. Patterns, 3:100512, 2022. ISSN 2666-3899.
- Annotated high-throughput microscopy image sets for validation. Nature Methods 2012 9:7, 9:637–637, 2012. ISSN 1548-7105.
- Jacob Cohen. Statistical power analysis. Current Directions in Psychological Science, 1(3):98–101, 1992.