MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis (2010.14925v4)

Published 28 Oct 2020 in cs.CV, cs.AI, and cs.LG

Abstract: We present MedMNIST, a collection of 10 pre-processed medical open datasets. MedMNIST is standardized to perform classification tasks on lightweight 28x28 images, which requires no background knowledge. Covering the primary data modalities in medical image analysis, it is diverse on data scale (from 100 to 100,000) and tasks (binary/multi-class, ordinal regression and multi-label). MedMNIST could be used for educational purpose, rapid prototyping, multi-modal machine learning or AutoML in medical image analysis. Moreover, MedMNIST Classification Decathlon is designed to benchmark AutoML algorithms on all 10 datasets; We have compared several baseline methods, including open-source or commercial AutoML tools. The datasets, evaluation code and baseline methods for MedMNIST are publicly available at https://medmnist.github.io/.

Authors (3)

Jiancheng Yang (54 papers)
Rui Shi (76 papers)
Bingbing Ni (95 papers)

Citations (248)

View on Semantic Scholar

Summary

The paper introduces MedMNIST, a collection of ten standardized, lightweight medical image datasets for various classification tasks.
The paper evaluates multiple AutoML models and shows that no single algorithm consistently excels across all data modalities.
The paper highlights practical challenges like overfitting on small datasets and emphasizes the need for advanced regularization techniques.

An Analytical Overview of "MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis"

The paper presents "MedMNIST," a comprehensive collection of ten pre-processed open medical image datasets, specifically tailored for classification tasks utilizing lightweight $28 \times 28$ pixel images. This initiative addresses the interdisciplinary complexities inherent in medical image analysis, making it accessible even to those without a specialized background. The datasets encapsulate a range of primary medical image modalities and pose various classification challenges, including binary/multi-class, ordinal regression, and multi-label tasks.

Dataset Composition and Characteristics

MedMNIST is designed with four core characteristics in mind:

Educational Value: The datasets originate from various open medical image sources and are accompanied by Creative Commons licenses, ensuring their utility in educational settings.
Standardization: All data is presented in a uniform format, ensuring ease of use without necessitating domain-specific knowledge.
Diversity: This collection covers an extensive array of data scales—spanning from 100 to 100,000 samples—and multiple task types, which enhances its usefulness across different machine learning scenarios.
Lightweight Nature: The small image size facilitates rapid prototyping and is conducive to the evaluation of multi-modal machine learning algorithms and AutoML methods.

MedMNIST Classification Decathlon

Inspired by the Medical Segmentation Decathlon, MedMNIST includes the Classification Decathlon as a benchmark to facilitate comparison of AutoML algorithms across all ten datasets. This initiative addresses the existing gap in benchmarks for AutoML within the domain of medical image analysis. The Decathlon requires algorithms to demonstrate efficacy without manual tuning, thus functioning as a testbed for assessing algorithmic performance in a standardized manner.

Performance Evaluation

To evaluate the capabilities of current methodologies, a selection of baseline models was employed, including ResNets with early-stopping strategies, open-source AutoML tools like auto-sklearn and AutoKeras, and the commercial Google AutoML Vision platform. The metrics used for evaluation were AUC and ACC, which provide holistic performance insights by focusing on prediction scores and label accuracy, respectively.

Key Findings and Implications

The results highlighted in the paper indicate that Google AutoML Vision generally outperformed other methods, evidencing its robustness across varying data scales and types. However, it was observed that no single algorithm consistently excelled across all datasets, underscoring the complexity inherent in medical image analysis tasks. Auto-sklearn exhibited limitations, especially with image data, while AutoKeras demonstrated proficiency predominantly with larger datasets.

The evidence of overfitting, particularly with smaller datasets, underscores a critical area for future research: establishing adequate regularization techniques to mitigate such issues. This could include exploring data augmentation methods, model ensembling, and different optimization strategies.

Theoretical and Practical Implications

From a theoretical standpoint, this paper emphasizes the necessity for algorithmic versatility and adaptive methodologies in AutoML to handle diverse data modalities and varying task complexities effectively. Practically, the MedMNIST collection serves as a resourceful benchmark fostering progress in the application of AutoML in medical image analysis, supporting the development of algorithms equipped to generalize well across diverse contexts.

Conclusion

The MedMNIST initiative is a valuable contribution to the field of medical image analysis. By packaging multiple datasets into one cohesive framework, it not only provides a practical tool for benchmarking but also opens avenues for educational purposes and the advancement of AutoML research. As the field progresses, incorporating MedMNIST as a staple in algorithm testing could catalyze innovations, enriching the body of research dedicated to improving automated medical diagnostics.

This paper sets a foundation for future investigations aimed at optimizing AutoML strategies, ultimately contributing to the betterment of medical image classification methodologies. Such research endeavors are pivotal as they hold the potential to significantly enhance the speed and accuracy of medical diagnostics, benefiting both practitioners and patients alike.